Previously, we had listed our goals for our file system:

File System Goals

This lecture, we will focus on ***DURABILITY***.

SECTION 1: File system failures

File systems: What can go wrong?

  1. Lots of devices, any of them can die
What will happen to our file system if any of these crash?

What kind of failures are we worried about?


However, we are not concerned with arbitrary failure. (ex) taking a sledgehammer to it
We are only concerned with limited failure. (ex) battery died, cosmic rays flipped a bit in RAM when system is off

Considering only these types of failures, we can improve robustness in our file system by creating a MODEL for these failures.

A model for failures should:

To start off, let's look at a simple model...

SECTION 2: A simple model

Our simple model will have two conditions:

  1. The laptop's battery is super old, so we have to keep the laptop plugged into power.
  2. The only possible system failure is through losing power (it gets unplugged).

SCENARIO: We want to write to a sector on disk.

Write to disk using function: write_sector (***how we implement this function depends on our model)

Suppose we LOSE POWER while executing write_sector. What could possibly happen?

  • Possibility #1: It might work!
  • Possibility #2: Bad sector results...
    - Due to partially written sector.
    - If you try to read the section, it will fail.
    => To determine, we can (1) ask the vendor, or (2) test it ourselves

Whichever occurred, we need a model that will account for both possibilities.

TO MODEL THIS:

time: 0 1 2
data in block: X ? Y

This models both scenarios!
We want our file system to work even in this environment, where the value of ? can be anything. How can we solve this?

SOLUTION #1: The simple solution (i.e. throw more disk space at the problem)

Golden Rule of Durability: Don't overwrite your only copy! Make backups!

With the Golden Rule of Durability in mind, let's make a copy of the file first, then modify the original file. That way we'll always have a copy of the file, just in case the system fails.

t 0 1 2
F1 X X Y
F2 ? Y Y

Steps of Solution #1:
  1. Make a copy of F1, called F2.
  2. Modify F2 to have the new value.
  3. Modify F1 to have the new value.
  4. Delete F2.

Number of writes: 4
>> Can we improve this?

SOLUTION #2: Use a write marker

Add a new field, a write marker, that will keep track of the file that we just wrote to.

t -> 0 1 2
F1 X X X
F2 ? Y Y
marker 1 1 2

Steps of Solution #2:
  1. Make a copy of F1, called F2.
  2. Modify F2 to have the new value.
  3. Modify the write marker to point to F2.

Because at Step 3, the write marker already points to a file with the new value, we can stop there.

Number of writes: 3
>> That's less writes! But it is still 3 writes for ultimately one sector write. Can we do even better?

SOLUTION #3: Use timestamps

Add the timestamps of when the file was written to the end of the file and pick the file with the more current timestamp. This eliminates the need for the write marker.

Include the timestamp at the end of file's contents

Steps of Solution #3:
  1. Write the new value concatenated with the timestamp to F2.

If the system fails, restart the system then check for the most current timestamp. Use that file.

Number of writes: only 1!
Downsides:
  • What if system fails when writing the timestamp? Garbage values can be read as a different time.
  • Won't work if there are multiple users using the file system.

>> Great, only one write! But we don't like those downsides... How about we take a look at the "classic" solution to this problem...

SOLUTION #4: The "classic" solution

Uses three copies, strive to have all three copies have the same content.

Steps of Solution #4:
t 0 1 2 3 4 5 6
F1 A ? B B B B B
F2 A A A ? B B B
F3 A A A A A ? B

To read, if F1 = F2, return F1. Else, return F3. Essentially: Majority rules.

Downsides:
  • Needs 3 copies???

>> We have an okay algorithm to make sure we have robustness in our system, but we can't make a great solution because our model makes too little assumptions. This means we're making too general of a solution, and too general of a solution means that it won't work great for any model.

SOLUTION: Make more assumptions!

SECTION 3: Assumptions and invariants

To get us going, let us add some commonly used assumptions for models for failures, the Lampson-Sturgis Assumptions, to our own model:

Lampson-Sturgis Assumptions

  • Storage writes can fail, and can corrupt written sector or nearby sector
  • Storage can decay spontaneously (alpha-particles! dust particles! cosmic rays!)
  • Later read can detect bad sectors
  • Errors are relatively rare
  • Repairs can be done in time (if we have a disk that corrupts over time, can change it)

With these new assumptions in mind, let's see how a system call, rename, will be implemented under the model:

rename("a", "b")

Steps for rename for current scenario:

  1. Read working directory's inode
  2. Read data block of working directory that contain's file a's info
  3. Change "a" to "b" in RAM
  4. Write the data block to disk
  5. Update working directory's inode timestamp
  6. Write the inode to disk
Operations: 2 reads, 2 writes

What happens if there is a system failure after...

  • ...Step 2? Nothing; it would be as if the call to rename never occurred.
  • ...Step 4? The name would be changed on disk, but the inode wouldn't reflect that this update has happened.
  • ...Step 5? The timestamp of the inode was changed in RAM, but not written to disk yet.

Along with regular assumptions, we also need assumptions that will always be true for our system. We'll call these assumptions invariants. Let us come up with some invariants for our own model.

Invariants for what will always be true for our system:

(ex: for BSD-style system w/ superblock, bitmap, inode table, data blocks, directories, indirect blocks...)
  1. Every block in file system is used for exactly 1 purpose.
  2. All referenced blocks are properly initialized with data appropriate with their title.
  3. All referenced data blocks are marked used in the bitmap.
  4. All unreferenced data blocks are marked as free in bitmap.
Having invariants isn't enough, though: we need to make sure that these will not be violated by the user. Therefore we have consequences:

Consequences for violation of above invariants

  1. Blocks will be read as other types of blocks, which will cause undefined behavior => DISASTER!!!
  2. Blocks may have old/garbage data in them, which will be read and will cause undefined behavior => DISASTER!!!
  3. Blocks that are referenced will be stomped on by newly allocated blocks, losing the data => DISASTER!!!
  4. Blocks that have been removed will still be marked as in use => ...Memory leak.

SECTION 4: Two ideas for robust file systems

There are two common systems to implement in file systems to guarantee robustness:
  1. Commit record
  2. Journaling

1. Commit record

-> separate method used to mark whether an atomic action is done or not done.

Code for doing commits:

BEGIN

pre-phase phase (if we crashed here, as if nothing happened) [invisible to upper level]

COMMIT or ABORT

post-phase phase (clean-up stuff) [invisible to upper level]

END

Although there are technically four writes required, it appears as one commit to the user (after the commit write).

2. Journaling

-> append-only file system

No file system, just a journal

+ Solves many inconsistency problems
+ If mostly writing, you avoid seeks
- Wastes storage like crazy!!

Another option is to mix elements of both ideas into one.

3. HYBRID FILE SYSTEM

-> elements of journaling w/ elements of commit record
  1. Log the change first into journal.
  2. Put a commit record on the end of the log, later can copy in to cell storage.
  3. Mark as DONE.
  • Recovery must be idempotent! (because recovery can crash too)
  • Recovery must handle aborts

Two types of logging:


1. Write-ahead log:
  • first log what you'll do
  • COMMIT
  • copy in cell data

2. Write-behind log:
  • log old cell contents
  • write new cell contents
  • COMMIT