CS 111 Spring 2010
Scribe Notes for 5/13/10, Lecture 13
by Joseph Silva, Kevin Deggelman, and Khai Nguyen
File System Robustness
Useful Links
Terminology
- error - a mistake made by the designer (in the designer's
head)
- fault - a bug in the code which has yet to be triggered
- failure - a bug which has been triggered and noticed by
users
File System Goals
- durability - ability to survive limited failures in
hardware (according to your failure model)
- For example: a power failure while writing a block
- atomicity - changes are either fully realized or not
realized at all
- performance - Throughput & latency
- example: larger batch sizes increase throughput at a
cost to latency
example system call
rename("a","b")
a) find data block
b) read data into RAM
b-1) check if "b" exists, if so mark it as unused
if we
update inode info here link count may be 1 less than it should
(dangerous)
c) change "a" to "b"
d) write block to disk
if we update inode info here link count may be 1
more than it should (preferable)
Note: Assume sector writes are atomic but block writes are not
Why not have rename("a","b")
fail if b exists?
- Professor Eggert needed to update small applications
remotely
- He wanted to enforce atomicity in entire /usr/local
directory
Solution:
- create /usr/local2/ (not atomic)
- rename ("usr/local2", "usr/local")
- fails because they are directories
- instead use a symbolic link and change that
$
cd
usr
$
ln
-s local2 tmp
$
mv
tmp
local
//rename("tmp","local")
Suppose we want to
rename("d/f","e/g")
- We will have to change
- data block in d
- data blocks in e
- inode for old e/g
- bitmaps for old e/g
- bitmaps for e
File System Correctness Invariants
- every block in file system is used for exactly 1 purpose
- boot block
- superblock
- bitmap
- inode
- data
- All reachable blocks are properly initialized
- All reachable data blocks are marked as used in the bitmap
- All unreachable data blocks are marked as unused in the
bitmap
- Every inode's link count is no less than the number of
directory entries
- Every inode's link count is no greater than the number of
directory entries
Numbers 4 and 6 will cause
memory leaks, all the others lead to disaster, obviously we prefer a
memory leak to a disaster. Memory leaks can be recovered from
using fsck.
fsck
- Checks file system for leaks, fixing bitmap accordingly
- filesystem should be unmounted
- to fix root file system must boot into single-user mode,
mount root filesystem as read-only
- can also fix potential disasters
- orphaned files (positive link count with no directory
entries) are put into lost+found
- a file with a 0 link count but directory entries has its
link counted increased to its number of directory entries
A common place a malicious
program may hide files is in free space, which is only accessible from
the raw device
Idea:
Think of write as a slow phase between scene a and scene b. The slow
phase is undefined and contains garbage data.
How to implement atomic write atop hardware that doesn't write
atomically
X = original file
Y = new file
? = middle of a write (unknown state)
Write 3 copies:
Copy/Case
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
A
|
X
|
?
|
Y
|
Y
|
Y
|
Y
|
Y
|
B
|
X
|
X
|
X
|
?
|
Y
|
Y
|
Y
|
C
|
X
|
X
|
X
|
X
|
X
|
?
|
Y
|
When reading:
if (A==B) return A
else return C
Write 2 copies:
Copy/Case |
1
|
2
|
3
|
4
|
5
|
A
|
X
|
?
|
Y
|
Y
|
Y
|
B
|
X
|
X
|
X
|
?
|
Y
|
When reading:
if (A==B) return A
else return B
Lampson-Sturgis
assumption
- Writes may fail or corrupt other pieces of storage (a later
read will catch the bad write)
- Storage may spontaneously decay (error correction will
catch it)
- Errors are rare -> you can assume that they can be
corrected as they happen without worrying that they might cascade
Performance as the Enemy of
Robustness
- Suppose you file system needs to do 3 dozen writes
- Sort by disk address
- Write in sorted order
- This results in a big performance increase, however it may
ruin decisions made at the higher level to prevent disaster
- Solution: Pass dependencies to lower level and sort without
violating them
- Result: Not as great of a performance increase but still
much better than not sorting
Suppose we are saving a file in
Emacs ^X^S
write()
//CRASH HERE!
write()
write()
This violates the GOLDEN RULE OF ATOMICITY:
Never write
over your last copy of info.
Try again:
IDEA #1
- Write to new file system first, then copy back to actual
- Again we have the problem of not knowing which copy is
correct following a crash
- Keep an extra bit of info to tell us which copy to use
called the "commit record"
Write to new file system
Set bit
Copy to actual file system
Clear bit
- Alternatively we could alternate between the two copies and
use the bit to keep track of which one is current, saving us an extra
copy each time
High level Atomic Operations
BEGIN
write write write
COMMIT
post commit phase //if needed for performance, fsck does this after crash
END
IDEA #2 : Journaling
- Partition disk into two regions
- cell data - normal filesystem content
-
- journal - long array of commit records
-
- writes to the log are sequential
- could be put on a different physical drive for a
performance increase
- After a crash fsck looks for everything thats been
committed but not finished and fixes it
Logging Protocol
A: Write Ahead Logging
Begin
log update into journal (e.g. "block #9637 should contain "xyzw----")
commit
write new data into cells
End
B: Write Behind Logging
Begin
log all old cell values
install new values into cell data
commit
End
Advantages of A
- writes to the log are sequential
- need not read old values
Advantages of B
- better for buggy systems
- fsck does less work -> faster reboot recovery
|