CS 111 Winter 2013
Lecture 13: File System Robustness - February 27th 2013
by Yoshnee Raveendran & David Mao
Table of Contents
Ideal File System Goals
- Performance
- A file system with high throughput, i.e. able to accomodate more requests per second.
- A file system with low latency, i.e able to handle a new request as quickly as possible.
- Durability
- The
data survives any "Failure" of the underlying system. For example, when
we write to a file: the data "survives", i.e gets saved accordingly.
- Atomicity
- When a change(write/rename/delete) is performed, the change
is either done entirely or not done entirely. Partial changes should
not occur, since this might result in an inaccuarate state.
What can go wrong with a file system?
- Any one of the above components could just stop working suddenly.
- Failure is not just any arbitrary failure (like physically breaking the keyboard) but more to of a limited failure:
- A very likely failure for example is a laptop loosing power supply causing the battery to die
- When referring to failures, we need to consider the
likelihood of a failure and the damage this failure can do to our
system. We will come up with a simple model of a failure in the next
section.
A simple model - Loss of power
Situation: Laptop is so old, that the baterry doesnt work well. Lets
assume that the only failure that can happen is that this laptop looses
power suddenly. Now, lets say that we are writing(low level:
write_sector will tell the disk controller to replace old to new) to a
pre-exiting file F as follows:
Suppose that the laptop looses power right in the middle of "write-sector", then one of two things can occur:
- Power supply has a charged capacitor that has reserved energy to
continue finishing the task despite not having power. Hence, the write
might work and result in an accurate F'.
- Partially written-sector, so F' is not accurate.
To model this:
Attempt 1:
0 1 2
time
-------------------->
While writing, if the power loss happened at time 1, then the data could be anything (?).
Attempt 2
An improvement from attempt 1, instead of 1 copy
of file F, we could have a backup copy and so we will not be
overwriting our only copy. We will use a marker to take note of which
file (either the original or the backup) needs to be written to. Lets
call the original file, F1 and the backup file F2.
How we will write to the files:
F1
|
X
|
X
|
Y
|
F2
|
?
|
Y
|
Y
|
marker
|
1
|
1
|
2
|
However, the downside to this approach is that we will require 3 writes - hence too much overhead.
Attempt 3
Another way to succesfully recover the file F is by keeping three
copies of the files: F1, F2, and F3. The user first makes changes to
F1, followed by F2 and lastly F3.
F1
|
X
|
?
|
Y
|
Y
|
Y
|
Y
|
Y
|
F2
|
X
|
X
|
X
|
?
|
Y
|
Y
|
Y
|
F3
|
X
|
X
|
X
|
X
|
X
|
?
|
Y
|
To recover the file i.e. read the file at any given time, we use the following procedure:
if F1 == F2
then return F1
else return F3
The above procedure guarantees that we will always either get the old
version of the file (X) or the newly-written updated version (Y), so we
will never get garbage (?).
However, just like before, there is too much overhead with this method,
since we will be keeping 3 copies of the file and writing multiple
times.
Lampson-Sturgis Assumptions
- Storage writes may fail and can corrupt written sector ( maybe even a nearby sector)
- Storage can decay spontaneously (dust particles etc)
- A later read can detect a bad sector
- Errors are relatively rare
- Repairs can be done in time
Another Atomic Write Example: Changing File Names
Suppose we have a file "a" that we want to rename to "b":
The procedure to do that is as follows:
- Pull inode 357 into memory
- Read data block 952736 into RAM
- Change 'a' to 'b' in the RAM
- Write data block onto disk
- Update inode time stamp
- Write inode to disk
What can go wrong with this?
Suppose that a crash
occurs just after step 4(before step 5), then this will mean that the
rename did in fact happen, but the time-stamp was not updated
accordingly. If a crash were to occur after step 2(before step 3), then
there will be no loss, since the rename wasnt executed.
For the purpose of this class, we will assume that single data block writes are always atomic.
Now lets say we want to rename ("d/a", "e/a") instead:
The procedure to this is as follows:
- read inode 357 into memory (d)
- read data block 952736 into RAM
- read inode 362 into memory (e)
- read data block 1012 into RAM
- update d's data in RAM
- update e's data in RAM
- write d's data (a in d is removed)
- write e's data (a in e is created)
We can see a downside in this approach. For instance, if the power
supply were to suddenly go off just after step 7 (before step 8) then
we would have lost file a forever. If we flip steps 7 and 8:
7. write e's data (a in e is created)
8. write d's data (a in d is removed)
Now, if we were to loose power just after step 7 (before step 8), then
we will have two copies of a after reboot. However, the link count of
this file will only be '1', and hence it will be disastrous if the user
were to delete either one copy of this file and try to access the other
at a later time. To solve this situation, we should update the link
count of this file.
Models of Underlying Systems
- Every block in a file system is used for exactly 1 purpose.
- All referenced blocks are properly initialized with data appropriate for their type.
- All referenced data blocks are marked as used in the bitmap.
- All unreferenced blocks are markes as free in the bitmap.
Consequences of violating the above 4 rules:
- Disaster - if we were to use a block for 2 purposes, then we may corrupt the pointers
- Disaster - An uninitialized block will not point us to the right data
- If a referenced block is marked as free in the bitmap, then multiple files will think that they own the sama data block
- If unreferenced block is marked as used in the bitmap, then it will never be called, hence causing a memory leak
Two Ideas for a Robust File System
- Commit Record
- A separate record that is used to mark if an atomic action is done or not done
- how a commit may be performed:
BEGIN
pre-commit phase (invisible to upper level)
COMMIT or ABORT
post-commit phase
END
2. Journaling
- With journaling, data will never be deleted. New data will be
merely appended to the existing data and a log of all operations will
be kept to ensure that we can keep track of recent changes.
- Pro's and Con's to this:
- - waste's storage!
- + solves many inconsistency problems
- + avoids multiple seeks if mostly writing
- We are going to have a hybrid file system ( which is an append only file-system + a cell storage)
- Write ahead log:
- Log what you will do
- Commit
- Copy data to cell storage
- Recovery Procedure for write ahead:
- Look for commits in the journal from left to right
- Write behind log:
- Log old cell contents
- Write new value into cell
- Commit record
- Recovery Procedure for write behind:
- Look for uncommitted records in the journal from right to left