CS 111 Scribe Note Lecture 15: File System Robustness

Author: Xiaohui,Zhou

Date: November 18 2013

What Is Robustness of A File System?

-- A robust file system will still works even if you throw unusual data/usage patterns etc.

-- It can decompose into sub-goals:

Durability
- Data survives failure of underlying system
- e.g. Lose power
Atomicty
- Changes are either done or not done
Performance
- System with high throughput
- Low Latency (handling it quickly)

What Can Go Wrong With Hardware?

lose power (today we will focus on this one)
- lose contents of RAM, Cache(CPU, controller), registers
- We want to keep what's on the disk
- one common approach: backup power(assume we're not doing this)
run out of disk space
heavily loaded system
sudden disconnection,such as unplugging a USB flash drive
lose disk (disk crash)
...

Lose Power

Suppose we are trying to implement C-x C-s in Emacs

if we lose power here, we can lose data: "write(fd, buf,sectorsize);"

What will happen if suddenly the machine loses its power?

CPU: can execute a few instructions to save the data
disk controller: can move arms to protect the disk

We like to assume sector writes are atomic, but they are not always.

Paper: "Understanding the robustness of SSDs under Power Fault"

They tested 15 SSDs, and only 2 of them worked after they cut power, and other 13 of them suffered from:

bit corription
flying writes
shorn writes
metadata corruption (a more serious error)
dead device
unserializability (instructions are executed in random order)

maybe if a HDD under same test, the result will be better

How To Solve This Problem?

GOLDEN RULE OF ATOMICITY: NEVER OVERWRITE YOUR ONLY COPY

SOLUTION 1 : we maintain two copies: A and B

	A	B
	OLD DATA	OLD DATA
WRITE A	Unknown	OLD DATA
	NEW DATA	OLD DATA
WRITE B	NEW DATA	NEW DATA
	NEW DATA	NEW DATA

Problems:

blocked data by 2X times
what if data disagree?

SOLUTION 2 : we maintain three copies: A, B and C

	A	B	C
	OLD DATA	OLD DATA	OLD DATA
WRITE A	Unknown	OLD DATA	OLD DATA
	NEW DATA	OLD DATA	OLD DATA
WRITE B	NEW DATA	Unknown	OLD DATA
	NEW DATA	NEW DATA	OLD DATA
WRITE C	NEW DATA	NEW DATA	Unknown
	NEW DATA	NEW DATA	NEW DATA

idealized model: writes are not atomic but they eventually finish
It works but in real life nobody uses this as there exist a simpler way

Lampson-Sturgis Assumptions

storage writes may fail or corrupt another piece of storage
A later read can detect the bad sector
storage may spontoneously decoy sector
Errors are rare
repairs can be done in time (before another failure occurs)

Example 1: rename("a","b")

STEPS:

read inode into memory
read data blocks into memroy
change "a" to "b" in memory
write data blocks out to disk
update time stamp in the inode
write inode into disk

what if the system crushed at step 4 or 5?

Eggert assumption: block writes are atomic

Example 2: rename("d/a", "d/bb")

harder as it may get block allocation

Example 3: rename("d/a", "e/b")

Transferring data to a different directory

STEPS:

incremnet link count
write e's data
write d's data
decrement link count

LINK COUNT NEVER UNDERESTIMATES TRUE VALUE BUT IT CAN OVERESTIMATE

Invariants For Your System

Every block of file system is used for exactly 1 purpose,such as boot, super, bitmap, inode, data
All referenced blocks are initiallized to data appropriate for their type
All referenced blocks are marked as USED in bitmap
All unreferenced blocks arer marked as FREE in bitmap

Violate these rules will result data lost(#1,3), disaster(#2), or storage leak(#4)

Performance Improvment Ideas

1. Commit Record

write "data" outputs into a prelimited area(e.g write into a temp file on disk)
wait for all block have hit disk
write a commit record , saying "commited"
copy tempory fata back to original, want to hit disk
write "done" to commit record

Pattern:(appears to be atomic at higher level)


BEGIN
	pre-commit phase 
		work is being done, but you can back out
	
COMMIT or ABORT(atomic at lower level)
	post-commit phase
		work usually for performance, can't back out
        
END

2. Journal

a sequence of instructions describes a change to F.S


			BEGIN, WRITE, COMMIT, DONE
		+---+---+---+---+---+---+---+-------+		
		|   |   |   |   |   |   |   |\\\\\\\|
		+---+---+---+---+---+---+---+-------+
		    cell blocks    			  unused

if all the block are used, we can start from the begining as usually these are garbage now

Logging protocals:

write ahead logs:

log planned writes
and implement them later

write behind logs:

log old data
about to be rewritten
write new data into cell
mark as done

replaying logs after crush: