Lecture 13: File System Robustness

Notes prepared by: Bryan Chaidez and Magdaleno Nunez

File System Goals

Performance - High throughput (ability to handle lots of requests per second) and low latency

Atomicity - When you make a change to the file system, changes are either done or not done, never partially done.

Durability - Data survives failure of the underlying system (only limited failure)

What could go wrong with durability?

Keyboard could die
CPU failure, disk could die
Battery dies
RAM could unexpectedly flip a random bit
//insert windows joke here

What failures are we worried about?

We want limited failure, within reason

Errors that are most likely and/or most damaging

Models of Underlying Systems

We need a model for failures

Affects our design
Should correspond to the real world
Want simplicity

An example with a simple model

No battery
Can only lose power

Suppose during write_sector, we lose power

If you're lucky, the power supply might have enough power to complete the command
There will not be power to complete the command, resulting in a bad sector

To model this, we can assume this is how write works:

Golden Rule: Never overwrite your only copy

Using 4 writes:

Write F2 first (F2 = Y), set write marker equal to 2
Write F1 next (F1 = Y), set write marker equal to 1

Using 3 writes:

We didn't need 4 writes because marker now points to 2
However, we still need 3 writes to do 1 write
Is there a more efficient way?

Using timestamps

Write F2 (F2 = Y), then concatenate the timestamp
Now we only need 1 write, but timestamp could be for garbage

Classic Solution

Strive to have all 3 copies hold the same value
To read after crash:
if(F1 == F2)
return F1 //or F2
else
return F3
This still needs 3 copies; we want less

Lampson-Sturgis Assumptions

Storage writes may fail and can corrupt written sector or a sector nearby
Storage can decay spontaneously
A later read can detect the bad sectors
Errors are relateively rare
Repairs can be done in time

e.g. rename("a","b")

read inode 357 into memory
read data block 952736 into RAM
change "A" to "B" in RAM
write data block out to disk
update inode timestamp
write inode to disk

What can go wrong?

Crashes after steps 4 and 5 may cause problems

CS111 Assumptions

Single data block writes are atomic (all or nothing)

e.g. rename("a","bb")

This can copy an entry and move to a bigger entry in data blocks (only works if block is not full)

e.g. rename("d/a","e/a")

read 357, 362, 952736, 1012 into RAM, and assume there is enough room in 1012 for a new entry

write 1012
write 952736
write 362
write 357

We do 1 first so we don't lose track of the file. If we crash between 1 and 2, then remove 362 or 357, we lose 921, which results in a dangling pointer.

To fix this, after step 1, write "921 link count to 2".

After step 2, change the count back to 1

The problem here is, if we crash between 3 and 4, we will have memory leaks because the link count will be too high

If we crash between 1 and 2, we'll have this problem as well

To fix this, change order to:

change 921 link count to 2
write 1012
write 952736
change 921 link count to 1
write 362
write 357

Invariants

These are conditions that should always be true for your system

e.g. for BSD-style file system

Every block in the file system is used for exactly one purpose
All reference blocks are properly initialized wiht data appropriate for their type
All reference data blocks are marked as used in bitmap
All unreferenced data blocks are marked as free in bitmap

Consequences for violating the corresponding invariant:

Disaster
Disaster
Disaster
Memory Leak

If we needed to modify an invariant, we should choose 4, since it has the least serious consequence.

Building Robust File Systems

Commit Record

Keep a separate record used to mark whether an atomic action is done or not done

Code for doing commits:

	
	BEGIN
	
	     pre-commit phase (invisible to upper level)
		 
	COMMIT or ABORT
	
	     post-commit phase (invisible to upper level)
	
	END

From the user's point of view, the whole commit process is atomic

Journaling

- Wastes storage and will eventually run off disks
+ Solves many inconsistency problems
+ If mostly writing, avoids seeks

log the change into the journal first
put a commit record at the end of file system
later, we can copy it into cell storage and mark it as DONE

Recovery must be idempotent (because recovery can crash too!)

Recover must handle aborts as well

Write-ahead log:

log what you do
commit
copy in cell data

Write-behind log:

log old cell contents
write new cell contents
commit

Both methods work, but with different performance