Scribe Notes

CS 111 Winter 2013

Lecture 13: File System Robustness - February 27th 2013

by Yoshnee Raveendran & David Mao

Ideal File System Goals
What can go wrong with a file system?
A simple model - Loss of power
Lampson-Sturgis Assumptions
Another Atomic Write Example: Changing File Names
Models of Underlying Systems
Two Ideas for a Robust File System

Ideal File System Goals

Performance
- A file system with high throughput, i.e. able to accomodate more requests per second.
- A file system with low latency, i.e able to handle a new request as quickly as possible.
Durability
- The data survives any "Failure" of the underlying system. For example, when we write to a file: the data "survives", i.e gets saved accordingly.
Atomicity
- When a change(write/rename/delete) is performed, the change is either done entirely or not done entirely. Partial changes should not occur, since this might result in an inaccuarate state.

What can go wrong with a file system?

Any one of the above components could just stop working suddenly.
Failure is not just any arbitrary failure (like physically breaking the keyboard) but more to of a limited failure:
- A very likely failure for example is a laptop loosing power supply causing the battery to die

When referring to failures, we need to consider the likelihood of a failure and the damage this failure can do to our system. We will come up with a simple model of a failure in the next section.

A simple model - Loss of power

Situation: Laptop is so old, that the baterry doesnt work well. Lets assume that the only failure that can happen is that this laptop looses power suddenly. Now, lets say that we are writing(low level: write_sector will tell the disk controller to replace old to new) to a pre-exiting file F as follows:

Suppose that the laptop looses power right in the middle of "write-sector", then one of two things can occur:

Power supply has a charged capacitor that has reserved energy to continue finishing the task despite not having power. Hence, the write might work and result in an accurate F'.
Partially written-sector, so F' is not accurate.

To model this:
Attempt 1:

0 1 2
time
-------------------->

While writing, if the power loss happened at time 1, then the data could be anything (?).

Attempt 2
An improvement from attempt 1, instead of 1 copy of file F, we could have a backup copy and so we will not be overwriting our only copy. We will use a marker to take note of which file (either the original or the backup) needs to be written to. Lets call the original file, F1 and the backup file F2.

How we will write to the files:

F1	X	X	Y
F2	?	Y	Y
marker	1	1	2

However, the downside to this approach is that we will require 3 writes - hence too much overhead.

Attempt 3
Another way to succesfully recover the file F is by keeping three copies of the files: F1, F2, and F3. The user first makes changes to F1, followed by F2 and lastly F3.

F1	X	?	Y	Y	Y	Y	Y
F2	X	X	X	?	Y	Y	Y
F3	X	X	X	X	X	?	Y

To recover the file i.e. read the file at any given time, we use the following procedure:

if F1 == F2
then return F1
else return F3

The above procedure guarantees that we will always either get the old version of the file (X) or the newly-written updated version (Y), so we will never get garbage (?).

However, just like before, there is too much overhead with this method, since we will be keeping 3 copies of the file and writing multiple times.

Lampson-Sturgis Assumptions

Storage writes may fail and can corrupt written sector ( maybe even a nearby sector)
Storage can decay spontaneously (dust particles etc)
A later read can detect a bad sector
Errors are relatively rare
Repairs can be done in time

Another Atomic Write Example: Changing File Names

Suppose we have a file "a" that we want to rename to "b":
Rename_1

The procedure to do that is as follows:

Pull inode 357 into memory
Read data block 952736 into RAM
Change 'a' to 'b' in the RAM
Write data block onto disk
Update inode time stamp
Write inode to disk

What can go wrong with this?
Suppose that a crash occurs just after step 4(before step 5), then this will mean that the rename did in fact happen, but the time-stamp was not updated accordingly. If a crash were to occur after step 2(before step 3), then there will be no loss, since the rename wasnt executed.

For the purpose of this class, we will assume that single data block writes are always atomic.

Now lets say we want to rename ("d/a", "e/a") instead:

Rename_2

The procedure to this is as follows:

read inode 357 into memory (d)
read data block 952736 into RAM
read inode 362 into memory (e)
read data block 1012 into RAM
update d's data in RAM
update e's data in RAM
write d's data (a in d is removed)
write e's data (a in e is created)

We can see a downside in this approach. For instance, if the power supply were to suddenly go off just after step 7 (before step 8) then we would have lost file a forever. If we flip steps 7 and 8:

7. write e's data (a in e is created)
8. write d's data (a in d is removed)

Now, if we were to loose power just after step 7 (before step 8), then we will have two copies of a after reboot. However, the link count of this file will only be '1', and hence it will be disastrous if the user were to delete either one copy of this file and try to access the other at a later time. To solve this situation, we should update the link count of this file.

Models of Underlying Systems

Every block in a file system is used for exactly 1 purpose.
All referenced blocks are properly initialized with data appropriate for their type.
All referenced data blocks are marked as used in the bitmap.
All unreferenced blocks are markes as free in the bitmap.

Consequences of violating the above 4 rules:

Disaster - if we were to use a block for 2 purposes, then we may corrupt the pointers
Disaster - An uninitialized block will not point us to the right data
If a referenced block is marked as free in the bitmap, then multiple files will think that they own the sama data block
If unreferenced block is marked as used in the bitmap, then it will never be called, hence causing a memory leak

Two Ideas for a Robust File System

Commit Record

A separate record that is used to mark if an atomic action is done or not done

how a commit may be performed:

BEGIN
pre-commit phase (invisible to upper level)
COMMIT or ABORT
post-commit phase
END

2. Journaling

With journaling, data will never be deleted. New data will be merely appended to the existing data and a log of all operations will be kept to ensure that we can keep track of recent changes.
Pro's and Con's to this:

- waste's storage!
+ solves many inconsistency problems
+ avoids multiple seeks if mostly writing

We are going to have a hybrid file system ( which is an append only file-system + a cell storage)
Write ahead log:

Log what you will do
Commit
Copy data to cell storage

Recovery Procedure for write ahead:

Look for commits in the journal from left to right

Write behind log:

Log old cell contents
Write new value into cell
Commit record

Recovery Procedure for write behind:

Look for uncommitted records in the journal from right to left