CS 111 Spring 2010 
      
      Scribe Notes for 5/13/10, Lecture 13 
      
      by Joseph Silva, Kevin Deggelman, and Khai Nguyen 
       
        
      File System Robustness 
       
      Useful Links 
       
      
      Terminology 
      
      
        - error - a mistake made by the designer (in the designer's
head)
 
        - fault - a bug in the code which has yet to be triggered
 
        - failure - a bug which has been triggered and noticed by
users
 
         
       
      File System Goals
      
        - durability - ability to survive limited failures in
hardware (according to your failure model)
          
            - For example: a power failure while writing a block
 
           
         
        - atomicity - changes are either fully realized or not
realized at all
 
        - performance - Throughput & latency
          
            - example: larger batch sizes increase throughput at a
cost to latency
 
             
           
         
       
      example system call
          rename("a","b")
 
a) find data block 
b) read data into RAM 
    b-1) check if "b" exists, if so mark it as unused 
           if we
update inode info here link count may be 1 less than it should
(dangerous) 
c) change "a" to "b" 
d) write block to disk 
    if we update inode info here link count may be 1
more than it should (preferable) 
       
      Note: Assume sector writes are atomic but block writes are not
       
      Why not have rename("a","b")
fail if b exists? 
      
      
        - Professor Eggert needed to update small applications
remotely
 
        - He wanted to enforce atomicity in entire /usr/local
directory
 
       
        Solution: 
      
      
        - create /usr/local2/ (not atomic)
 
        - rename ("usr/local2", "usr/local")
          
            - fails because they are directories
 
            - instead use a symbolic link and change that
 
           
         
       
               
$
cd
usr 
               
$
ln
-s local2 tmp 
               
$
mv
tmp
local               
//rename("tmp","local") 
       
      Suppose we want to
rename("d/f","e/g") 
      
      
        - We will have to change
          
            - data block in d
 
            - data blocks in e
 
            - inode for old e/g
 
            - bitmaps for old e/g
 
            - bitmaps for e
 
           
         
       
       
      File System Correctness Invariants 
      
        - every block in file system is used for exactly 1 purpose
          
            - boot block
 
            - superblock
 
            - bitmap
 
            - inode
 
            - data
 
             
           
         
        - All reachable blocks are properly initialized
 
        - All reachable data blocks are marked as used in the bitmap
 
        - All unreachable data blocks are marked as unused in the
bitmap
 
        - Every inode's link count is no less than the number of
directory entries
 
        - Every inode's link count is no greater than the number of
directory entries
 
       
      Numbers 4 and 6 will cause
memory leaks, all the others lead to disaster, obviously we prefer a
memory leak to a disaster.  Memory leaks can be recovered from
using fsck. 
       
       
fsck 
      
        - Checks file system for leaks, fixing bitmap accordingly
 
        - filesystem should be unmounted
 
        - to fix root file system must boot into single-user mode,
mount root filesystem as read-only
 
        - can also fix potential disasters
 
        - orphaned files (positive link count with no directory
entries) are put into lost+found
 
        - a file with a 0 link count but directory entries has its
link counted increased to its number of directory entries
 
       
      A common place a malicious
program may hide files is in free space, which is only accessible from
the raw device 
       
      Idea: 
Think of write as a slow phase between scene a and scene b. The slow
phase is undefined and contains garbage data. 
       
       
How to implement atomic write atop hardware that doesn't write
atomically 
X = original file 
Y = new file 
? = middle of a write (unknown state) 
       
      Write 3 copies: 
      
        
          
            Copy/Case 
             | 
            1 
             | 
            2 
             | 
            3 
             | 
            4 
             | 
            5 
             | 
            6 
             | 
            7 
             | 
           
          
            A 
             | 
            X 
             | 
            ? 
             | 
            Y 
             | 
            Y 
             | 
            Y 
             | 
            Y 
             | 
            Y 
             | 
           
          
            B 
             | 
            X 
             | 
            X 
             | 
            X 
             | 
            ? 
             | 
            Y 
             | 
            Y 
             | 
            Y 
             | 
           
          
            C 
             | 
            X 
             | 
            X 
             | 
            X 
             | 
            X 
             | 
            X 
             | 
            ? 
             | 
            Y 
             | 
           
        
       
When reading: 
      if (A==B) return A 
      else return C 
       
      Write 2 copies: 
      
        
          
            | Copy/Case | 
            1 
             | 
            2 
             | 
            3 
             | 
            4 
             | 
            5 
             | 
           
          
            A 
             | 
            X 
             | 
            ? 
             | 
            Y 
             | 
            Y 
             | 
            Y 
             | 
           
          
            B 
             | 
            X 
             | 
            X 
             | 
            X 
             | 
            ? 
             | 
            Y 
             | 
           
        
       
When reading: 
      if (A==B) return A  
      else return B 
       
      Lampson-Sturgis
assumption 
      
        - Writes may fail or corrupt other pieces of storage (a later
read will catch the bad write)
 
        - Storage may spontaneously decay (error correction will
catch it)
 
        - Errors are rare -> you can assume that they can be
corrected as they happen without worrying that they might cascade
 
       
       
      Performance as the Enemy of
Robustness 
      
        - Suppose you file system needs to do 3 dozen writes
 
       
      
        - Sort by disk address
 
        - Write in sorted order
 
       
      
        - This results in a big performance increase, however it may
ruin decisions made at the higher level to prevent disaster
 
        - Solution: Pass dependencies to lower level and sort without
violating them
 
        - Result: Not as great of a performance increase but still
much better than not sorting
 
       
       
      Suppose we are saving a file in
Emacs ^X^S 
      write() 
      //CRASH HERE! 
      write() 
      write() 
       
This violates the GOLDEN RULE OF ATOMICITY: 
       Never write
over your last copy of info. 
       
       
      Try again: 
       
      IDEA #1  
      
        - Write to new file system first, then copy back to actual
          
            - Again we have the problem of not knowing which copy is
correct following a crash
 
           
         
        - Keep an extra bit of info to tell us which copy to use
called the "commit record"
 
       
      Write to new file system 
      Set bit 
      Copy to actual file system 
      Clear bit 
      
        - Alternatively we could alternate between the two copies and
use the bit to keep track of which one is current, saving us an extra
copy each time
 
       
      High level Atomic Operations 
      BEGIN 
      write write write 
      COMMIT 
      post commit phase //if needed for performance, fsck does this after crash                          
      END 
       
      IDEA #2 : Journaling
      
        - Partition disk into two regions
          
            - cell data - normal filesystem content
 
            
            - 
              
            
 
            - journal - long array of commit records
 
            
            - 
              
              
                - writes to the log are sequential
 
                - could be put on a different physical drive for a
performance increase
 
               
             
           
         
        - After a crash fsck looks for everything thats been
committed but not finished and fixes it
 
       
       
      Logging Protocol 
       
      A: Write Ahead Logging 
      Begin 
      log update into journal (e.g. "block #9637 should contain "xyzw----") 
      commit 
      write new data into cells 
      End 
      
      B: Write Behind Logging 
      Begin 
      log all old cell values 
      install new values into cell data 
      commit 
      End 
      
Advantages of A 
      
        - writes to the log are sequential
 
        - need not read old values
 
         
       
Advantages of B 
      
        - better for buggy systems
 
        - fsck does less work -> faster reboot recovery
 
         
       
       
       |