Winter 2016, Lecture 14

File System Robustness

Written by Aditya Kotte and Ritam Banerjee

What's the Worst That Could Happen?

Gzip very good at compressing files, but it has a bug…

ifd = open("foo",O_RDONLY);
ofd = open("foo.gz",O_WRONLY|O_CREATE..); 
write(ofd, buf, ⬭);
read(ifd, buf, bufsize);
if(close(fd)!=0) error();
if(close(ofd)!=0)error();
                        	<---crash?
if(unlink(“foo”)!=0)error();
exit(0);

System crash at that point could cause you to lose all your data. How?

We are being so careful about closing all the files.

So what’s the bug?
Let’s go on to file system layout....

BSD File System Layout

(Figure 1 below)
“Figure

Free block bitmap keeps track of where available blocks are.
This is how we grow files. Must be fast.
1 bit/block of data. That’s a 8*8192 = 65536:1 Very good compression ratio!

This is plausibly cacheable in RAM.
This is a classic "Performance" issue.

Performance Issues with File Systems

Overhead of reading from disk:
seek time-time for read head to move to sector (10ms)
+ rotational latency - physical constraint of disk drive (8ms)
+ transfer from disk + copy to RAM from cache (0.1ms)
≅ 18 ms

18 ms is a long time for a computer… enough time to run 8 million instructions!

File systems try hard to avoid using secondary storage for these reasons.

I/O Metrics

Utilization

Low utilization = not using I/0 devices well/not as much as perhaps should be
can be percentage (%) or [0,1]

Throughput

(Operations/sec)

Latency

These often offset each other. An improvement in one comes at a cost to others.
Files systems that have good latency often have poor throughput & vice versa.

How to improve File System performance

BATCHING: Combine multiple read or write requests into big read or write requests.
- coalesce small requests into big ones
- How cat works (figure 2 below)
SPECULATION: Guess future behavior of an application and do that work now.
- Works hand in hand with batching, speculates that you will want to read or write the next thousand bytes or so.
- Example when reading: PREFETCHING
- Example when reading: DALLYING
LOCALITY OF REFERENCE:
- Temporal locality: accessing the same data in a similar time frame
- Spatial locality: accessing some data means you will probably want to access nearby data soon.
Locality is what makes speculation work because of the predictability it offers. If there was no locality, speculation would not improve performance since its difficult to predict. Speculating when you can’t predict what happens is very unlikely to provide any optimization.

Block Layer of File Systems
(Figure 3 below)

Read block i/write block j maintains a cache: greatly improves performance

Lets speed this up:
Simple Model of hard disk access: Disk is represented as continuous blocks.
Avg cost of read:
Integrate to get actual delay: (equation 1 below)

To approximately maximize throughput: use SSTF (Shortest Seek Time First). Do the request closest to where the read head is currently. Suffers from the problem of starvation. Also suffered when needing to seek back and forth from the read head, not optimal since it is shorter in terms of distance to the read head, but overall distance is much greater.

SSTF + FCFS←wait time
```
		   ↙wait time 
```
random factor→fw - |i-h|
```
		          ↖head
```
Elevator Algorithm: This has problems as well, even though its a compromise. At either end, top or bottom, you will wait longer than at the middle.

Circular Elevator: After you reach one end of the disk, you always go back to the other end. This also relies on the trick that spinning all the way across the disk is faster per unit due to velocity.

rename("Foo","Fuzzy") --> Follow inode to directory entry. In directory entry, change name and name length.
We will write block 512 or whatever block it points to later.
What if directory entry is full?
We will need to make free block (1973), add new directory entry (block 4), write to directory inode with new directory entry (block 312).
Renaming now makes us write to 4 blocks, instead of 1. Whats the most secure order to write these blocks?
(Figure 4 below)

Order affects stability in case of crash.
File System Invariants
- Statements that will always be true even if you crash in the middle of an operation

Penalty for violation	Have to keep track of important parts of file system on disk
disaster	1. Every block will be used for exactly one purpose
disaster	2. All referenced blocks are appropriately initialized
disaster	3. All referenced data blocks are marked as __used__ in the bitmap
storage leak	4. All unreferenced data blocks are marked __free__ in bitmap

Gzip source code doesn't use ordered option.

ifd = open("foo",O_RDONLY);
ofd = open("foo.gz",O_WRONLY|O_CREATE..); 
write(ofd, buf, ⬭); 
				<--- if(ifd<0) error();  //check for errors
read(ifd, buf, bufsize); 
				<--- if(ofd<0) error();  //check for errors
				<--- if(fsync(ofd)!=0) error(); //fsync makes sure that the ofd is written then and there regardless of dallying or speculations
if(close(fd)!=0) error();
if(close(ofd)!=0)error();
                        	<---crash?
if(unlink(“foo”)!=0)error();  //access block u
exit(0);//↘block U (written)
           crash->write(W)  //u might happen first, and w might never happen. We have removed old file but not written compressed file. We could use a lot of data.