Lecture 13: File System Robustness

Scribe notes for Feb 27, 2013

Muh-Wei Chern

File System Goals

Performance

High throughput: We want to handle lots of requests per second.
Low latency: We want to handle a request as fast as possible.

Durability

We want data to survive failure (not arbitrary failure but limited failure) underlying system.

Atomicity (Correctness)

We want changes that are either done or not done.
Atomicity interacts with durability; we want atomicity to work even if durability fails.

What can go wrong with durability?

What failures are we worried about?

Likely
Damage

We want to come up with a model for failures; this model is crucial because it will affect our design. We want the model to be simple and corresponds to reality.

For now, let us worry about a simple model that has no battery and the only failure it has is losing power.

Suppose you have a write_sector, what happens if you lose power in the middle of writing write_sector?
1. It might still work!
2. Bad sector results because it is writing in process.

To model this:

"Classic" Solution
3 copies:

Write all three copies, one at a time.

To read:
if F1 = F2
return F1
else
F3

We want less than 3 copies

Lampson-Sturgis Assumptions

Storage writes can fail and can corrupt a written sector or a nearby sector.
Storage can decay spontaneously.
A later read can detect "bad" sectors.
Errors are relatively rare assuming we have devices that are unreliable but not too unrealiable.
Repairs can be done in time.

Rename system call: rename("a","b")

1. Read inode 357 into memory
2. Read data block 952736 into RAM
3. Change "a" to "b" in RAM
4. Write data block out to disk
5. Update inode time stamp
6. Write inode to disk

What can go wrong?
Suppose there is a crash between steps 4 and 5, after reboot, data will change but inode will not be updated.

CS111 Assumptions
Single data block writes are atomic; either all blocks write successfully or none gets written, all-or-nothing.

rename("d/a", "e/a"): Renaming a file that used to be in directory d to a file in directory e.

Read 357, 362, 952736 and 1012 into RAM, assuming there is room in 1012 for new entry.
Order is important

1. Write 1012
2. Write 952736

We want to do step 1 before step 2 because if we do step 2 first and it crashes, the file "a" would disappear.
The advantage of doing step 1 before step 2 is that we can keep track of the file, but the disadvantage is if it crashes between steps 1 and 2, it will lose the link to the data, which is bad.

The solution would be to do something between steps 1 and 2.
1. Write 1012
2. Write "921 with a link count of 2"
3. Write 952736
4. Write "921 with a link count of 1"
This solution would cause memory leak sometimes and we can still lose data if it crashes between the new steps 1 and 2.

The solution to the case above would be to switch the order of steps 1 and 2.
1. Write "921 with a link count of 2"
2. Write 1012
3. Write 952736
4. Write "921 with a link count of 1"
If it crashes between the new steps 1 and 2, it would cause memory leak but will not lose data; having memory leak is better than losing the whole data.

Models of Underlying Systems

Example: for BSD-style file system
Recall that a BSD-style file system has superblock, bitmap, inode table, and data blocking containing directories, indirect blocks and ordinary data.

Invariants for our system
(1) Every block in the file system is used for exactly one purpose.
(2) All referenced blocks are properly initialized with data appropriate for their type.
(3) All referenced data blocks are marked used in bitmap.
(4) All unreferenced data blocks are marked as free in bitmap.

Consequences of Violation
Violating (1), (2) or (3) would cause a disaster. Violating (4) would cause memory leak, so (4) is the only candidate to be removed.

Two ideas for Building Robust File Systems
1. Commit record
A separate record that is used to mark whether an atomic action is done or not done.

File system code for doing commits, which appears to be atomic from a user's point of view:
BEGIN
Pre-commit phase (invisible to upper level)
COMMIT or ABORT (make changes visible)
Post-commit phase (invisible to upper level)
END

2. Journaling
Append-only file system: The downside of this system is that it wastes storage, but the advantages are that there is no link count problem, it solves many inconsistency problems and it can avoid seeks if it is mostly writing.

Hybrid file system: Journal (append-only) + Cell storage

Log the change first into the journal
Put a commit record at the end so you can copy it into cell storage and mark it as DONE later
Recovery must be idempotent because recovery can crash, too!
Recovery must handle aborts, too.

Write-head log

log what you will do
commit
copy in cell data

Write-behind log

begin log old cell contents
write new cell contents
commit