CS 111 Winter 2013: File System Robustness

Previously, we had listed our goals for our file system:

File System Goals

Performance: High throughput, low latency
Atomicity (Correctness): Changes are either done or not done
Durability: Data should survive failure of the underlying system

This lecture, we will focus on ***DURABILITY***.

SECTION 1: File system failures

File systems: What can go wrong?

Lots of devices, any of them can die

What will happen to our file system if any of these crash?

What kind of failures are we worried about?

Failures that are likely
Failures that cause damage

However, we are not concerned with arbitrary failure. (ex) taking a sledgehammer to it
We are only concerned with limited failure. (ex) battery died, cosmic rays flipped a bit in RAM when system is off

Considering only these types of failures, we can improve robustness in our file system by creating a MODEL for these failures.

A model for failures should:

correspond to the real world
be simple
determine our design and how programs will be implemented

To start off, let's look at a simple model...

SECTION 2: A simple model

Our simple model will have two conditions:

The laptop's battery is super old, so we have to keep the laptop plugged into power.
The only possible system failure is through losing power (it gets unplugged).

SCENARIO: We want to write to a sector on disk.

Write to disk using function: write_sector (***how we implement this function depends on our model)

Suppose we LOSE POWER while executing write_sector. What could possibly happen?

Possibility #1: It might work!
Possibility #2: Bad sector results...

- Due to partially written sector.
- If you try to read the section, it will fail.
=> To determine, we can (1) ask the vendor, or (2) test it ourselves

Whichever occurred, we need a model that will account for both possibilities.

TO MODEL THIS:

time:	0	1	2
data in block:	X	?	Y

This models both scenarios!
We want our file system to work even in this environment, where the value of ? can be anything. How can we solve this?

SOLUTION #1: The simple solution (i.e. throw more disk space at the problem)

Golden Rule of Durability: Don't overwrite your only copy! Make backups!

With the Golden Rule of Durability in mind, let's make a copy of the file first, then modify the original file. That way we'll always have a copy of the file, just in case the system fails.

t	0	1	2
F1	X	X	Y
F2	?	Y	Y

Steps of Solution #1:

Make a copy of F1, called F2.
Modify F2 to have the new value.
Modify F1 to have the new value.
Delete F2.

Number of writes: 4
>> Can we improve this?

SOLUTION #2: Use a write marker

Add a new field, a write marker, that will keep track of the file that we just wrote to.

t ->	0	1	2
F1	X	X	X
F2	?	Y	Y
marker	1	1	2

Steps of Solution #2:

Make a copy of F1, called F2.
Modify F2 to have the new value.
Modify the write marker to point to F2.

Because at Step 3, the write marker already points to a file with the new value, we can stop there.

Number of writes: 3
>> That's less writes! But it is still 3 writes for ultimately one sector write. Can we do even better?

SOLUTION #3: Use timestamps

Add the timestamps of when the file was written to the end of the file and pick the file with the more current timestamp. This eliminates the need for the write marker.

Include the timestamp at the end of file's contents

Steps of Solution #3:

Write the new value concatenated with the timestamp to F2.

If the system fails, restart the system then check for the most current timestamp. Use that file.

Number of writes: only 1!
Downsides:

What if system fails when writing the timestamp? Garbage values can be read as a different time.
Won't work if there are multiple users using the file system.

>> Great, only one write! But we don't like those downsides... How about we take a look at the "classic" solution to this problem...

SOLUTION #4: The "classic" solution

Uses three copies, strive to have all three copies have the same content.

Steps of Solution #4:

t	0	1	2	3	4	5	6
F1	A	?	B	B	B	B	B
F2	A	A	A	?	B	B	B
F3	A	A	A	A	A	?	B

To read, if F1 = F2, return F1. Else, return F3. Essentially: Majority rules.

Downsides:

Needs 3 copies???

>> We have an okay algorithm to make sure we have robustness in our system, but we can't make a great solution because our model makes too little assumptions. This means we're making too general of a solution, and too general of a solution means that it won't work great for any model.

SOLUTION: Make more assumptions!

SECTION 3: Assumptions and invariants

To get us going, let us add some commonly used assumptions for models for failures, the Lampson-Sturgis Assumptions, to our own model:

Lampson-Sturgis Assumptions

Storage writes can fail, and can corrupt written sector or nearby sector
Storage can decay spontaneously (alpha-particles! dust particles! cosmic rays!)
Later read can detect bad sectors
Errors are relatively rare
Repairs can be done in time (if we have a disk that corrupts over time, can change it)

With these new assumptions in mind, let's see how a system call, rename, will be implemented under the model:

`rename("a", "b")`

Steps for `rename` for current scenario:

Read working directory's inode
Read data block of working directory that contain's file a's info
Change "a" to "b" in RAM
Write the data block to disk
Update working directory's inode timestamp
Write the inode to disk

Operations: 2 reads, 2 writes

What happens if there is a system failure after...

...Step 2? Nothing; it would be as if the call to rename never occurred.
...Step 4? The name would be changed on disk, but the inode wouldn't reflect that this update has happened.
...Step 5? The timestamp of the inode was changed in RAM, but not written to disk yet.

Along with regular assumptions, we also need assumptions that will always be true for our system. We'll call these assumptions invariants. Let us come up with some invariants for our own model.

Invariants for what will always be true for our system:

(ex: for BSD-style system w/ superblock, bitmap, inode table, data blocks, directories, indirect blocks...)

Every block in file system is used for exactly 1 purpose.
All referenced blocks are properly initialized with data appropriate with their title.
All referenced data blocks are marked used in the bitmap.
All unreferenced data blocks are marked as free in bitmap.

Having invariants isn't enough, though: we need to make sure that these will not be violated by the user. Therefore we have consequences:

Consequences for violation of above invariants

Blocks will be read as other types of blocks, which will cause undefined behavior => DISASTER!!!
Blocks may have old/garbage data in them, which will be read and will cause undefined behavior => DISASTER!!!
Blocks that are referenced will be stomped on by newly allocated blocks, losing the data => DISASTER!!!
Blocks that have been removed will still be marked as in use => ...Memory leak.

SECTION 4: Two ideas for robust file systems

There are two common systems to implement in file systems to guarantee robustness:

Commit record
Journaling

1. Commit record

-> separate method used to mark whether an atomic action is done or not done.

Code for doing commits:


BEGIN


pre-phase phase (if we crashed here, as if nothing happened) [invisible to upper level]


COMMIT or ABORT


post-phase phase (clean-up stuff) [invisible to upper level]


END

Although there are technically four writes required, it appears as one commit to the user (after the commit write).

2. Journaling

-> append-only file system

No file system, just a journal

+ Solves many inconsistency problems

+ If mostly writing, you avoid seeks

- Wastes storage like crazy!!

Another option is to mix elements of both ideas into one.

3. HYBRID FILE SYSTEM

-> elements of journaling w/ elements of commit record

Log the change first into journal.
Put a commit record on the end of the log, later can copy in to cell storage.
Mark as DONE.

Recovery must be idempotent! (because recovery can crash too)
Recovery must handle aborts

Two types of logging:

1. Write-ahead log:

first log what you'll do
COMMIT
copy in cell data

2. Write-behind log:

log old cell contents
write new cell contents
COMMIT

Computer Science 111 Lecture 13

File system robustness

File System Goals

SECTION 1: File system failures

File systems: What can go wrong?

What kind of failures are we worried about?

A model for failures should:

SECTION 2: A simple model

SOLUTION #1: The simple solution (i.e. throw more disk space at the problem)

SOLUTION #2: Use a write marker

SOLUTION #3: Use timestamps

SOLUTION #4: The "classic" solution

SECTION 3: Assumptions and invariants

Lampson-Sturgis Assumptions

`rename("a", "b")`

Steps for `rename` for current scenario:

What happens if there is a system failure after...

Invariants for what will always be true for our system:

Consequences for violation of above invariants

SECTION 4: Two ideas for robust file systems

1. Commit record

2. Journaling

3. HYBRID FILE SYSTEM

Two types of logging:

Computer Science 111 Lecture 13

File system robustness

File System Goals

SECTION 1: File system failures

File systems: What can go wrong?

What kind of failures are we worried about?

A model for failures should:

SECTION 2: A simple model

SOLUTION #1: The simple solution (i.e. throw more disk space at the problem)

SOLUTION #2: Use a write marker

SOLUTION #3: Use timestamps

SOLUTION #4: The "classic" solution

SECTION 3: Assumptions and invariants

Lampson-Sturgis Assumptions

rename("a", "b")

Steps for rename for current scenario:

What happens if there is a system failure after...

Invariants for what will always be true for our system:

Consequences for violation of above invariants

SECTION 4: Two ideas for robust file systems

1. Commit record

2. Journaling

3. HYBRID FILE SYSTEM

Two types of logging:

`rename("a", "b")`

Steps for `rename` for current scenario: