CS 111 Week 8
By Yu Peng (603444925)
Write (fd, buf, bufsize) during the write: crash + reboot
For files |
|
|
10,000,000 -bufsize |
|
ABCDE |
DEFGH |
|
ABC |
|
MNOP |
IJKL |
What about directories?
Rename (“a”, “b”)
a:27 |
|
b:65 |
before |
alpha beta |
|||
0 |
|
b:27 |
after I |
|
|||
b:27 |
|
0 |
After II |
*whichever scenario possible of crash
alpha (crash) beta
0 |
|
6:65 |
= unlink (“a”)
// lost info
a:27 |
|
b:27 |
= unlink (“b”); link(“a”,”b”)
Problems: link FSCK count wrong
Proposed POSIX model
RAM in memory cache of:
Regular files: blocks of a given size
File attributes: individual files
Directories: directory entries
Write(…) applies to cache.
.
Syscalls:
Slow move diskhead:
Fsync(fd) sync cache with (data(in data block)+attributes(in inode)) permanent store
=
Fdatasync(fd) same as fsync except cheaper
As a result, took syncout of Emacs unless it is interactive
hutinPOSIX sync()
idea for atomic rename() etc
1) Commit Record
Separate spot on file system
Implementation of rename has to do several write
Commit record says whether it happened.
Rename(a, bbbbbb):
BEGIN: (low level operations)
Pre-commit phase:
can back out,
leaving no trace
COMMIT (atomic) low level atomic write of one block
Make higher level action happen
Post-commit
Can not back out
Can improve performance for next time
Cleanup, optimization
END
2) JOURNAL
a) cell data, each cell is a block
b) simplest mode: infinitely long; append only for storage (for reads); random access (for reads)
one way to use journal
log the change you want to make
log & commit BEGIN
write the data 13 = “B”
Log END 37: “0”
*write-ahead log
Delicate place to crash COMMIT END
Recovery procedure: replays the log left to right
Problems:
What happens if a crash occurs during FSCK?
Should recover using idempotent operations
Radical Proposal: (log structure file system)
Keep journal on disk
Put cell data cache in RAM
+writes are fast w/c no seeking
-reads can take a while (if out of cache)
Write-behind lock (same example of wife-ahead)
13: “A” “B” BEGIN
37: “B”0 13: “A”
Fsck replays log 37: “B”
Backwards now, make changes to cells
(forward for write-ahead) COMMIT END
Example research file system:
120PB 200,000 hard drives (each 600 GB) fast drives
GPFS General Parallel File System
Distributed meta data (good performance –unlike fdfs) little power
Efficient directory indexing
Distributed locking
Partition awareness
System stays live during fsck
Next Problem: unreliable processes bad memory references
A[i]=27
Solutions:
-better programmers (doesn’t work)
-software check
(tell compiler to insert subscript checks e.g. Java Virtual Machine)
(performance problems)
-get help from hardware
(add SUBSCRIPT in to hardware)
-even this is too slow
*idea: base + bounds register to hardware
Hardware checks all memory access to fit in here ^
If the check fails, same as executing invalid instruction traps
Goes to kernel
Problem: you must know in advance how much memory needed fragmentation
Can’t grow a process
No absolute addresses in machine code must use position independent code (PIC)
Sometimes we want shared memory
Segments
Multiple base bond pairs, 1/segment
*requires hardware support
Each process has its own segment table
Segment have varying sizes
*use pages to solve segmentation/fragmentation problem
Pages
Like segments, but all segments are the same size (e.g. 4KiB)
Kernel
Give app page table that points to other memory, but not itself.