CS 111 Winter 2013
        File System Robustness
        by Chan Young Kim and Zhongtai David Sun
        Lecture by Professor Paul Eggert on Feb. 27, 2013 at UCLA
        
	
	Table of Contents
	
		-  What can go wrong with file systems?
 
		-  Approaches to Building Reliability 
 
		-  Lecture Focus - Loss of Power 
 
		
		-  Lampson-Sturgis Assumptions 
 
		-  Invariants For Your System 
 
		-  2 Ideas For Robust File Systems 
 
	
	
	
	Goals for File Systems
	
		- Performance
			
				- System with high throughput
 
				- Low Latency (handling it quickly)
 
			
		 
		- Durability
			
				- Data survives failure of underlying system
 
			
		 
		- Atomicty
			
				- Changes are either done or not done
 
			
		 
	
	Back to Contents
	What Can Go Wrong?
	Durability
	
		- Keyboards can flake out, etc Diagram 1
 
		- Physically destroying the labtop cannot be considered a failure, it is not  a reasonable measure of durability
 
		- We want limited failure
			
				- Cosmic rays flipping bits (happens around once a day)
 
				- Battery running out
 
			
	 
	What type of failures are we worried about?
	
		- Failures that are likely to occur frequently
 
		- Failures that causes damages
 
		- We want a model for failure
 
			
				- This will affect our design
 
				- We want the model to correspond with the real world
 
				- We want the model to be simple
 
			
	
	Back to Contents
	Approaches to Building Reliability
	We create a model
	For now, our model is very simple
	
		- The model contains no battery
 
		- Only failure we have to worry about is loss of power
			Example:
			
				- Old: Disk has file F which contains XXYYZZ[unused space]
 
				- New: F contains XXaYYZZ
 
				- How to make change? -> Write sector
 
			
	 
	Back to Contents
	Lecture Focus - Loss of Power
	 Suppose in the middle of the write sector, we lose power!
	
		- If you're lucky, it might work
 
		- Another probability (shutting down when changing the disk magnetism) is that you will have a bad sector
			-> read sector will fail
			To model this:
			
 
		- For some devices, "?" could be any data
 
		- We want to account for every possibility to be conservative
 
	
	
	Back to Contents
	How to Solve This Problem?
	
		- We maintain two copies of F
		F1:X
		F2:?
		
 
		- We do NOT overwrite our only copy
 
		- We will write F2 first to Y, and then write F1 to Y
			
				- First write
					F1:X
					F2:Y
				
 
				- Second write
					F1:Y
					F2:Y
				
 
			
		 
		- Another Problem arises: How do we know which one is F1 and which is F2?
			
				- Add a marker (1 or 2) on an extra sector to determine which file is real one
 
				- Problem: too much overhead
 
				- Another solution: Make tail end of sector into a timestamp recording when sector was written
 
				- Write F2 = Y^current time
 
				- This can be done in just one write!
 
				- Another problem arises:? can be anything, so current time can also be anything. In addition, the clock may not match
 
			
		 
		- How do we solve all of above probelms?
 
			
		
		- We still want a better solution: One that has less than 3 copies
 
		- Solution: we change our model
 
	
	Back to Contents
	
         Lampson-Sturgis Assumptions 
		
        	-  Storage writes may fail and can corrupt write sector, or nearby sector
			
 
        	-  Storage can decay spontaneously
			
 
        	-  A later read can detect bad sector
			
 
        	-  Errors are relatively rare
 
        	-  Repairs can be done in time 
 
			-  Example of storage write problem using these assumptions 
				
				Rename ("a", "b")
				-  Inode 357: 357 points to working directory (contains timestamp, owner, pointer to data block) 
 
				-  952736 directory entry in data block
 
				-  8192 bytes
 
				- Process
 
				1. Read inode 357 into memory 
				2. Read data block 952736 into RAM
				3. Change "a" to "b" in RAM
				4. Write data block out to disk
				5. Update inode timestamp
				6. Write inode to disk
				-  Problem with this implementation! -> If crash happens between 4 and 5, data lost!
 
				
			 
			-  Now we will make our own assumption (CS 111 Assumption) 
 
				
				-  Single data block writes are atomic (Either all gets written successfully or none)
				
 
				
			
			-  Example #2 of storage write problem 
				
				Case 1: Renaming "a" to "bb"
				-  Problem that may arise: Block is too full for the additional character 
 
				Case 2: Renaming "d/a" to "e/a" (Transferring one block in a directory to another directory)
				- Process
 
				1. Read 357 & 362 & 952736 & 1012 into RAM (assume room in 1012 for new entry) 
				2. Write 1012' (This process comes first, because we don't wanna lose track of the file. Link to the file is not broken) 
				3. Write 952736' 
				4. Write 362' 
				5. Write 357' 
				-  Problem with this implementation! -> If we crash between write 1012 and write 952736, one of them will lose the link to the file (link count is 1)!
 
				-  Solution -> add write "921 with a link count of 2" before the first write to 1012' and then after write 952736', write "921 with a link count of 1"
 
				
			 	
			
Back to Contents
				
		 Invariants For Your System 
		- What is an invariant?
			
				- One aspect of modeling underlying system
 
				- Boolean statement that will always be true
 
				- Data structure follows invariants
 
			
			 
			- Example of invariants in system models
			
				- BSD-Style file system: Superblock, bitmap, in ode table, data blocks (containing directories and indirect blocks and arbitrary data)
 
				- Invariants
				1. Every block in the F.S. is used for exactly one purpose (one of the lists above)
				2. All reference blocks are properly initialized with data appropriate for their type
				3. All referenced data blocks are marked used in bitmap 
				4. All unreferenced data blocks are marked free in bitmap
 
				- Consequences of violating the above invariants
				1. Bitmap and inodes will flip each other's bits -> Disaster
				2. Disaster
				3. Two things that own the same block -> Disaster
				4. Memory leak
 
				- The least consequence: #4
 
			
			 
		
	Back to Contents
	
		2 Ideas For Robust File Systems
		
			
			Journaling
				
					- Append-only file system: only way to modify is to append
						
							- Problem: waste storage like crazy
 
							- Upside: Solves many inconsistency problems of data structure, and also performance advantage (no seek)
 
						
					 
					- Solution: Hybrid file system. Journal, append-only + cell storage (table)
						
							- Log the change 1st into journal. Put a commit record at the end
 
							- Later, copy in cell storage, mark as DONE
 
							- When crashing, reboot procedure can check whether we're done copying it to cell storage from journal
 
							- Problem: With a log and cell storage, system crashes and we reboot. Recovery must be idempotent. (because recovery can crash too)
								
									- Recovery must handle aborts, too
 
								
							 
							- Write a head log: Log what you do. Commit, copy in cell 
 
							- Write a behind log: Begin, old cell contents, write new cell contents, commit
 
						
					 
			
		
	
		
	Back to Contents
	
	
	
This document is fully HTML 4.01 compliant: 