CS 111 Spring 2010

Scribe Notes for 5/13/10, Lecture 13

by Joseph Silva, Kevin Deggelman, and Khai Nguyen

File System Robustness

Useful Links

Terminology

error - a mistake made by the designer (in the designer's head)
fault - a bug in the code which has yet to be triggered
failure - a bug which has been triggered and noticed by users

File System Goals

durability - ability to survive limited failures in hardware (according to your failure model)
- For example: a power failure while writing a block
atomicity - changes are either fully realized or not realized at all
performance - Throughput & latency
- example: larger batch sizes increase throughput at a cost to latency

example system call

    rename("a","b")

a) find data block
b) read data into RAM
    b-1) check if "b" exists, if so mark it as unused
           if we update inode info here link count may be 1 less than it should (dangerous)
c) change "a" to "b"
d) write block to disk
    if we update inode info here link count may be 1 more than it should (preferable)

Note: Assume sector writes are atomic but block writes are not

Why not have rename("a","b") fail if b exists?

Professor Eggert needed to update small applications remotely
He wanted to enforce atomicity in entire /usr/local directory

Solution:

create /usr/local2/ (not atomic)
rename ("usr/local2", "usr/local")
- fails because they are directories
- instead use a symbolic link and change that

                $ cd usr
                $ ln -s local2 tmp
                $ mv tmp local                //rename("tmp","local")

Suppose we want to rename("d/f","e/g")

We will have to change
- data block in d
- data blocks in e
- inode for old e/g
- bitmaps for old e/g
- bitmaps for e

File System Correctness Invariants

every block in file system is used for exactly 1 purpose
1. boot block
2. superblock
3. bitmap
4. inode
5. data
All reachable blocks are properly initialized
All reachable data blocks are marked as used in the bitmap
All unreachable data blocks are marked as unused in the bitmap
Every inode's link count is no less than the number of directory entries
Every inode's link count is no greater than the number of directory entries

Numbers 4 and 6 will cause memory leaks, all the others lead to disaster, obviously we prefer a memory leak to a disaster. Memory leaks can be recovered from using fsck.

fsck

Checks file system for leaks, fixing bitmap accordingly
filesystem should be unmounted
to fix root file system must boot into single-user mode, mount root filesystem as read-only
can also fix potential disasters
orphaned files (positive link count with no directory entries) are put into lost+found
a file with a 0 link count but directory entries has its link counted increased to its number of directory entries

A common place a malicious program may hide files is in free space, which is only accessible from the raw device

Idea:
Think of write as a slow phase between scene a and scene b. The slow phase is undefined and contains garbage data.

How to implement atomic write atop hardware that doesn't write atomically
X = original file
Y = new file
? = middle of a write (unknown state)

Write 3 copies:

Copy/Case	1	2	3	4	5	6	7
A	X	?	Y	Y	Y	Y	Y
B	X	X	X	?	Y	Y	Y
C	X	X	X	X	X	?	Y

When reading:

if (A==B) return A

else return C

Write 2 copies:

Copy/Case	1	2	3	4	5
A	X	?	Y	Y	Y
B	X	X	X	?	Y

When reading:

if (A==B) return A

else return B

Lampson-Sturgis assumption

Writes may fail or corrupt other pieces of storage (a later read will catch the bad write)
Storage may spontaneously decay (error correction will catch it)
Errors are rare -> you can assume that they can be corrected as they happen without worrying that they might cascade

Performance as the Enemy of Robustness

Suppose you file system needs to do 3 dozen writes

Sort by disk address
Write in sorted order

This results in a big performance increase, however it may ruin decisions made at the higher level to prevent disaster
Solution: Pass dependencies to lower level and sort without violating them
Result: Not as great of a performance increase but still much better than not sorting

Suppose we are saving a file in Emacs ^X^S

write()

//CRASH HERE!

write()

write()

This violates the GOLDEN RULE OF ATOMICITY:

Never write over your last copy of info.

Try again:

IDEA #1

Write to new file system first, then copy back to actual
- Again we have the problem of not knowing which copy is correct following a crash
Keep an extra bit of info to tell us which copy to use called the "commit record"

Write to new file system

Set bit

Copy to actual file system

Clear bit

Alternatively we could alternate between the two copies and use the bit to keep track of which one is current, saving us an extra copy each time

High level Atomic Operations

BEGIN

write write write

COMMIT

post commit phase //if needed for performance, fsck does this after crash

END

IDEA #2 : Journaling

Partition disk into two regions

cell data - normal filesystem content

journal - long array of commit records
- writes to the log are sequential
- could be put on a different physical drive for a performance increase

After a crash fsck looks for everything thats been committed but not finished and fixes it

Logging Protocol

A: Write Ahead Logging

Begin

log update into journal (e.g. "block #9637 should contain "xyzw----")

commit

write new data into cells

End

B: Write Behind Logging

Begin

log all old cell values

install new values into cell data

commit

End

Advantages of A

writes to the log are sequential
need not read old values

Advantages of B

better for buggy systems
fsck does less work -> faster reboot recovery