CS 111 Week 8

By Yu Peng (603444925)

Write (fd, buf, bufsize)  during the write: crash + reboot

For files			10,000,000 -bufsize		ABCDE
DEFGH		ABC		MNOP	IJKL

What about directories?

Rename (“a”, “b”)

a:27	b:65	before
alpha beta
0	b:27	after I

b:27	0	After II

*whichever scenario  possible of crash

alpha_(crash)beta

6:65

= unlink (“a”)

// lost info

beta_(crash) alpha

a:27

b:27

= unlink (“b”); link(“a”,”b”)

Problems: link FSCK count wrong

Proposed POSIX model

RAM in memory cache of:

Regular files: blocks of a given size
File attributes: individual files
Directories: directory entries

Write(…) applies to cache.

Syscalls:

Slow move diskhead:

Fsync(fd)  sync cache with (data_{(in data
block)}+attributes_(in
inode)) permanent store

Fdatasync(fd)  same as fsync except cheaper

As a result, took syncout of Emacs unless it is interactive

hutinPOSIX sync()

idea for atomic rename() etc

1) Commit Record

- Separate spot on file system
- Implementation of rename has to do several write
- Commit record says whether it happened.

Rename(a, b^bbbbb):

BEGIN: (low level operations)

Pre-commit phase:

can back out,

leaving no trace

COMMIT (atomic) low level atomic write of one block

Make higher level action happen

Post-commit

Can not back out

Can improve performance for next time

Cleanup, optimization

END

2) JOURNAL

a) cell data, each cell is a block

b) simplest mode: infinitely long; append only for storage (for reads); random access (for reads)

one way to use journal

log the change you want to make

log & commit BEGIN

write the data 13 = “B”

Log END 37: “0”

*write-ahead log

Delicate place to crash  COMMIT END

Recovery procedure: replays the log left to right

Problems:

What happens if a crash occurs during FSCK?

Should recover using idempotent operations

Radical Proposal:  (log structure file system)

Keep journal on disk

Put cell data cache in RAM

+writes are fast w/c no seeking

-reads can take a while (if out of cache)

Write-behind lock (same example of wife-ahead)

13: “A”  “B” BEGIN

37: “B”0 13: “A”

Fsck replays log 37: “B”

Backwards now, make changes to cells

(forward for write-ahead) COMMIT END

Example research file system:

120PB 200,000 hard drives (each 600 GB)  fast drives

GPFS General Parallel File System

Distributed meta data (good performance –unlike fdfs) little power

Efficient directory indexing

Distributed locking

Partition awareness

System stays live during fsck

Next Problem: unreliable processes  bad memory references

A[i]=27

Solutions:

-better programmers (doesn’t work)

-software check

(tell compiler to insert subscript checks e.g. Java Virtual Machine)

(performance problems)

-get help from hardware

(add SUBSCRIPT in to hardware)

-even this is too slow

*idea: base + bounds register to hardware

Hardware checks all memory access to fit in here ^

If the check fails, same as executing invalid instruction  traps

Goes to kernel

Problem: you must know in advance how much memory needed fragmentation

Can’t grow a process

No absolute addresses in machine code  must use position independent code (PIC)

Sometimes we want shared memory

Segments

Multiple base bond pairs, 1/segment

*requires hardware support

Each process has its own segment table

Segment have varying sizes

*use pages to solve segmentation/fragmentation problem

Pages

Like segments, but all segments are the same size (e.g. 4KiB)

Kernel

Give app page table that points to other memory, but not itself.