CS 111 Winter 2013
File System Robustness
by Chan Young Kim and Zhongtai David Sun
Lecture by Professor Paul Eggert on Feb. 27, 2013 at UCLA
Table of Contents
- What can go wrong with file systems?
- Approaches to Building Reliability
- Lecture Focus - Loss of Power
- Lampson-Sturgis Assumptions
- Invariants For Your System
- 2 Ideas For Robust File Systems
Goals for File Systems
- Performance
- System with high throughput
- Low Latency (handling it quickly)
- Durability
- Data survives failure of underlying system
- Atomicty
- Changes are either done or not done
Back to Contents
What Can Go Wrong?
Durability
- Keyboards can flake out, etc Diagram 1
- Physically destroying the labtop cannot be considered a failure, it is not a reasonable measure of durability
- We want limited failure
- Cosmic rays flipping bits (happens around once a day)
- Battery running out
What type of failures are we worried about?
- Failures that are likely to occur frequently
- Failures that causes damages
- We want a model for failure
- This will affect our design
- We want the model to correspond with the real world
- We want the model to be simple
Back to Contents
Approaches to Building Reliability
We create a model
For now, our model is very simple
- The model contains no battery
- Only failure we have to worry about is loss of power
Example:
- Old: Disk has file F which contains XXYYZZ[unused space]
- New: F contains XXaYYZZ
- How to make change? -> Write sector
Back to Contents
Lecture Focus - Loss of Power
Suppose in the middle of the write sector, we lose power!
- If you're lucky, it might work
- Another probability (shutting down when changing the disk magnetism) is that you will have a bad sector
-> read sector will fail
To model this:
- For some devices, "?" could be any data
- We want to account for every possibility to be conservative
Back to Contents
How to Solve This Problem?
- We maintain two copies of F
F1:X
F2:?
- We do NOT overwrite our only copy
- We will write F2 first to Y, and then write F1 to Y
- First write
F1:X
F2:Y
- Second write
F1:Y
F2:Y
- Another Problem arises: How do we know which one is F1 and which is F2?
- Add a marker (1 or 2) on an extra sector to determine which file is real one
- Problem: too much overhead
- Another solution: Make tail end of sector into a timestamp recording when sector was written
- Write F2 = Y^current time
- This can be done in just one write!
- Another problem arises:? can be anything, so current time can also be anything. In addition, the clock may not match
- How do we solve all of above probelms?
- We still want a better solution: One that has less than 3 copies
- Solution: we change our model
Back to Contents
Lampson-Sturgis Assumptions
- Storage writes may fail and can corrupt write sector, or nearby sector
- Storage can decay spontaneously
- A later read can detect bad sector
- Errors are relatively rare
- Repairs can be done in time
- Example of storage write problem using these assumptions
Rename ("a", "b")
- Inode 357: 357 points to working directory (contains timestamp, owner, pointer to data block)
- 952736 directory entry in data block
- 8192 bytes
- Process
1. Read inode 357 into memory
2. Read data block 952736 into RAM
3. Change "a" to "b" in RAM
4. Write data block out to disk
5. Update inode timestamp
6. Write inode to disk
- Problem with this implementation! -> If crash happens between 4 and 5, data lost!
- Now we will make our own assumption (CS 111 Assumption)
- Single data block writes are atomic (Either all gets written successfully or none)
- Example #2 of storage write problem
Case 1: Renaming "a" to "bb"
- Problem that may arise: Block is too full for the additional character
Case 2: Renaming "d/a" to "e/a" (Transferring one block in a directory to another directory)
- Process
1. Read 357 & 362 & 952736 & 1012 into RAM (assume room in 1012 for new entry)
2. Write 1012' (This process comes first, because we don't wanna lose track of the file. Link to the file is not broken)
3. Write 952736'
4. Write 362'
5. Write 357'
- Problem with this implementation! -> If we crash between write 1012 and write 952736, one of them will lose the link to the file (link count is 1)!
- Solution -> add write "921 with a link count of 2" before the first write to 1012' and then after write 952736', write "921 with a link count of 1"
Back to Contents
Invariants For Your System
- What is an invariant?
- One aspect of modeling underlying system
- Boolean statement that will always be true
- Data structure follows invariants
- Example of invariants in system models
- BSD-Style file system: Superblock, bitmap, in ode table, data blocks (containing directories and indirect blocks and arbitrary data)
- Invariants
1. Every block in the F.S. is used for exactly one purpose (one of the lists above)
2. All reference blocks are properly initialized with data appropriate for their type
3. All referenced data blocks are marked used in bitmap
4. All unreferenced data blocks are marked free in bitmap
- Consequences of violating the above invariants
1. Bitmap and inodes will flip each other's bits -> Disaster
2. Disaster
3. Two things that own the same block -> Disaster
4. Memory leak
- The least consequence: #4
Back to Contents
2 Ideas For Robust File Systems
Journaling
- Append-only file system: only way to modify is to append
- Problem: waste storage like crazy
- Upside: Solves many inconsistency problems of data structure, and also performance advantage (no seek)
- Solution: Hybrid file system. Journal, append-only + cell storage (table)
- Log the change 1st into journal. Put a commit record at the end
- Later, copy in cell storage, mark as DONE
- When crashing, reboot procedure can check whether we're done copying it to cell storage from journal
- Problem: With a log and cell storage, system crashes and we reboot. Recovery must be idempotent. (because recovery can crash too)
- Recovery must handle aborts, too
- Write a head log: Log what you do. Commit, copy in cell
- Write a behind log: Begin, old cell contents, write new cell contents, commit
Back to Contents
This document is fully HTML 4.01 compliant: