Scribe Notes by Joshua Adler
Today we are looking at the BSD "fast" file system.
The BSD
is comprised of:
How can we find a bug in the file system code?
USE
INVARIANTS
Invariants are boolean expressions that should always be true.
For file systems these include:
Consequences of violating each invariant above:
Say the following actions occur
If the write goes directly onto disk then the read will take a
long time
Instead the kernel will look in cache to improve
performance
BATCHING - user writes a byte at a time, leave in cache, then
writes full block at once
DALLYING - don't write data for a
little bit and leave in cache
Since RAM is volatile, if we lose power write is lost
Even
worse, a write success might have been visible to the user
Problem 1: might write out of order (due to circular elevator
algorithm etc.)
Problem 2: Side effects might not be saved due to out
of order writes
Ways to fix
Have app tell us when write hits the disk
Complicate the API (int fsync(int fd))
Returns integer for all
the cached data for the file that is now written, returns -1 on bad
disk
void sync(void)
Takes everything that is cached
anywhere and schedules a write for it
When updating data you must change block of data + block with
inode (at least inode mtime must be updated)
for this there is
another system call: int fdatasync(int) which doesn't change the inode
Another example of a problem is rename("a" , "b");
Now rename ("x/a" , "y/b") - rename from 1 directory to another
x's directory entries will have "a" linked to an inode
y's
directory entries will be empty
Steps to do the rename will occur as follows:
This doesn't preserve all of our invariants because linkcount
might be 2 instead of 1 after crash, but we only lose space
To
combat this there is a command 'fsck' which looks at low level for
incorrect link counts
This command runs on unmounted file systems
only, and is often run after reboot
Say there is a set of pending requests
read block #9312
write block #497132
There are multiple scheduling algorithms that we can use such as
FIFO or SSTF (Shortest Seek Time First)
FIFO will always work
(might be slow) but
SSTF might never reach a distant request in some cases
We want a compromise that will be fast scheduling and avert disasters
3 main goals of filesystems
How to implement atomicity on a device that isn't atomic itself
Atomic write - void awrite(char x) {write(x);}
In general we
would: lock, then write, then unlock
This doesn't work for
powerloss though
Golden Rule of Atomicity - Never overwrite last copy of data
For this example we will be renaming a file from A to B
One method is to have 3 copies of the data
If there is a powerloss choose the majority, if all different choose data
I
I | A | ? | B | B | B | B | B |
II | A | A | A | ? | B | B | B |
III | A | A | A | A | A | ? | B |
This method works during powerloss at any point but it uses 3x disk space
Idea #1 Commit Record
Have another copy of the data blocks where we write the new
version
Create a commit record that keeps track of what the
correct set of data is (0 = old block, 1 = new block)
Idea #2 Journal
Make atomic changes to the cell data
The journal will be
updated to have a log of all changes made and all planned writes