CS 111

Scribe Notes for 5/21/13

by Kevin Takata and Ashish Lal

File System Robustness (continued)

Unaddressed Issues with File System Robustness

Large Writes

write (fd, buf, bufsize) where bufsize = 10000000 (very large)

For large writes, the kernel has to write block by block. Thus, large writes can have undefined behavior if a crash and reboot occurs somewhere in the middle of the write. For the berkeley file system, blocks are scattered in random order; therefore, for efficiency, the kernel will do writes out of order in order to exploit locality of reference of the disk arm.

EFGH ABCD IJKL

In the case above, if the disk needle is pointed to the block allocated where EFGH resides, it would write that to disk first followed by ABCD followed by IJKL. However, a crash and reboot in the middle of this write would cause undefined behavior where it would seem that some parts of the write worked and some parts didn't.

Directories

Similarly, a crash in the middle of a directory rename could result in two different cases.

Before:
"a" 27 "b" 65

After (I):
0 "b" 27
α β
When a crash occurs between writing α and writing β, you will be left with the old copy of "b" and you will have lost the old "a" resulting in:
0 "b" 65
≡unlink("a")

In the other potential case, After (II):
"b" 27 0
When a crash occurs between writing α and writing β, you will be left with the new copy of "b" and you will have lost the old "b" resulting in:
"a" 27 "b" 27
In the two cases above, the first one is disastrous because the file is lost. However, in the second case, although there will be leaked memory, it is not disastrous because it can be cleaned up when the system has been rebooted and fsck is run.

Proposed POSIX Model

Types of files Units
Regular files blocks of a given size statvfs outputs
File attributes Individual files
Directories Directory entries
System Calls: When writing using the write(...) function, there is no guarantee that it will be written to disk. It's written to cache first. In order to guarantee the the data to be written to disk, you must use fsync(fd) or fdatasync(fd). However, fsync is

Solutions to Make Atomic System Calls

Idea #1: Commit Record

Idea #2: Journal

Radical Proposal: (Log Structured File System)

Example Research File System:

Virtual Memory

Problem: Unreliable processes that have bad memory references
Example: a[i] = 27; //but i is out of range

Solutions:

Base and Bounds

////////// //////////
base bounds
The hardware checks that all memory access should fit between the base and the bound. If the check fails, the hardware traps. The kernel will then take control over the unruly process and deal with it. Problems:

Segments

Pages