CS 111 Scribe Note Lecture 15: File System Robustness
Author: Xiaohui,Zhou
Date: November 18 2013
What Is Robustness of A File System?
-- A robust file system will still works even if you throw unusual data/usage patterns etc.
-- It can decompose into sub-goals:
- Durability
- Data survives failure of underlying system
- e.g. Lose power
- Atomicty
- Changes are either done or not done
- Performance
- System with high throughput
- Low Latency (handling it quickly)
What Can Go Wrong With Hardware?
- lose power (today we will focus on this one)
- lose contents of RAM, Cache(CPU, controller), registers
- We want to keep what's on the disk
- one common approach: backup power(assume we're not doing this)
- run out of disk space
- heavily loaded system
- sudden disconnection,such as unplugging a USB flash drive
- lose disk (disk crash)
- ...
Lose Power
Suppose we are trying to implement C-x C-s in Emacs
if we lose power here, we can lose data: "write(fd, buf,sectorsize);"
What will happen if suddenly the machine loses its power?
- CPU: can execute a few instructions to save the data
- disk controller: can move arms to protect the disk
We like to assume sector writes are atomic, but they are not always.
Paper: "Understanding the robustness of SSDs under Power Fault"
They tested 15 SSDs, and only 2 of them worked after they cut power, and other 13 of them suffered from:
- bit corription
- flying writes
- shorn writes
- metadata corruption (a more serious error)
- dead device
- unserializability (instructions are executed in random order)
maybe if a HDD under same test, the result will be better
How To Solve This Problem?
GOLDEN RULE OF ATOMICITY: NEVER OVERWRITE YOUR ONLY COPY
SOLUTION 1 : we maintain two copies: A and B
|
A |
B |
|
OLD DATA |
OLD DATA |
WRITE A |
Unknown |
OLD DATA |
|
NEW DATA |
OLD DATA |
WRITE B |
NEW DATA |
NEW DATA |
|
NEW DATA |
NEW DATA |
Problems:
- blocked data by 2X times
- what if data disagree?
SOLUTION 2 : we maintain three copies: A, B and C
|
A |
B |
C |
|
OLD DATA |
OLD DATA |
OLD DATA |
WRITE A |
Unknown |
OLD DATA |
OLD DATA |
|
NEW DATA |
OLD DATA |
OLD DATA |
WRITE B |
NEW DATA |
Unknown |
OLD DATA |
|
NEW DATA |
NEW DATA |
OLD DATA |
WRITE C |
NEW DATA |
NEW DATA |
Unknown |
|
NEW DATA |
NEW DATA |
NEW DATA |
- idealized model: writes are not atomic but they eventually finish
- It works but in real life nobody uses this as there exist a simpler way
Lampson-Sturgis Assumptions
- storage writes may fail or corrupt another piece of storage
- A later read can detect the bad sector
- storage may spontoneously decoy sector
- Errors are rare
- repairs can be done in time (before another failure occurs)
Example 1: rename("a","b")
STEPS:
- read inode into memory
- read data blocks into memroy
- change "a" to "b" in memory
- write data blocks out to disk
- update time stamp in the inode
- write inode into disk
what if the system crushed at step 4 or 5?
Eggert assumption: block writes are atomic
Example 2: rename("d/a", "d/bb")
harder as it may get block allocation
Example 3: rename("d/a", "e/b")
Transferring data to a different directory
STEPS:
- incremnet link count
- write e's data
- write d's data
- decrement link count
LINK COUNT NEVER UNDERESTIMATES TRUE VALUE BUT IT CAN OVERESTIMATE
Invariants For Your System
- Every block of file system is used for exactly 1 purpose,such as boot, super, bitmap, inode, data
- All referenced blocks are initiallized to data appropriate for their type
- All referenced blocks are marked as USED in bitmap
- All unreferenced blocks arer marked as FREE in bitmap
Violate these rules will result data lost(#1,3), disaster(#2), or storage leak(#4)
1. Commit Record
- write "data" outputs into a prelimited area(e.g write into a temp file on disk)
- wait for all block have hit disk
- write a commit record , saying "commited"
- copy tempory fata back to original, want to hit disk
- write "done" to commit record
Pattern:(appears to be atomic at higher level)
BEGIN
pre-commit phase
work is being done, but you can back out
COMMIT or ABORT(atomic at lower level)
post-commit phase
work usually for performance, can't back out
END
2. Journal
a sequence of instructions describes a change to F.S
BEGIN, WRITE, COMMIT, DONE
+---+---+---+---+---+---+---+-------+
| | | | | | | |\\\\\\\|
+---+---+---+---+---+---+---+-------+
cell blocks unused
if all the block are used, we can start from the begining as usually these are garbage now
Logging protocals:
write ahead logs:
- log planned writes
- and implement them later
write behind logs:
- log old data
- about to be rewritten
- write new data into cell
- mark as done
replaying logs after crush:
write ahead logs:
- replayed Log to R
write behind logs:
- R to L