CS 111
Lecture 16 (5/29/12)
Abimael Arevalo
Media Faults
Disk-fails: solvable via logging
   
If an unchanged sector fails, the data cannot be reconstructed using the journal.
 
RAID (Redundant Array of Independent Disks)
	- 
		Simulate a large drive with a lot of little drives.
	
 
	- 
		Save money and gain reliability.
	
 
 
RAID levels a la Berkeley
	- 
		RAID 0: no redundancy, just a simulated big disk.
		
		- 
			Concatenation: performance of the virtual disk is roughly that of a physical disk.
 
			 
		 
		- 
			Striping: Divide A, B, C, and D into pieces and places the pieces into the drives. Each drive can be run in parallel to extract the data. Virtual disk performance is roughly four times faster than a physical disk.
 
			 
		 
		- 
			Growing is easier in concatenation than in striping.
		
 
		 
	 
	- 
		RAID 1: Mirroring
  
		
		- 
			Write to both drives.
		
 
		- 
			Read from either (can pick the closest disk head).
 ASSUMPTION: reads can detect faults.
		 
		 	
	 
	- 
		RAID 2,3,4,5,6,7,...
	
 
	- 
		RAID 4
  
		
			- 
				Reads are like RAID 0 concatenation and has worse read performance than RAID 0 striping.
			
 
			- 
				Writes are like RAID 1 (need to read drive E before writing).
			
 
			- 
				If C fails: C = A^B^E^D ('^' = XOR)
			
 
		 
	 
 
Disk Drive Reliability
	- 
		Mean time to failure is (typically) 300,000 hours (34 years), but in reality, drives get replaced every 5 years.
  
	 
	- 
		Probability distribution function for single disk failure.
  
	 
	- 
		Probability distribution function for RAID 4 (never replace drives).
  
	 
	- 
		Probability distribution function for RAID 4 (assuming failed disks are replaced).
		
			- 
				Disk fails.
			
 
			- 
				(60 minutes later) operator replaces it.
			
 
			- 
				(8 hours later) rebuilding phase. Depends on drive size.
			
 
		 
	 
	- 
		RAID schemes can be nested.
	
 
	- 
		Q: Does RAID make backups obsolete?
 
		A: No, we still need backups for user errors.
	 
 
Distributed Systems
RPC (Remote Procedure Calls) vs. System Calls and Function Calls
	- 
		Caller sees: 
x = fft (buf, n); 
		Implementation:  
		     send (buf, n);   // to server 
		     // Wait for response 
		     return;
	 
	- 
		Caller and callee do not share address space. There is no call by reference (at least, not efficiently).
	
 
	- 
		Caller and callee may be different architectures (ARM vs. SPARC or little vs. big endian).
 
		Requires conversion:
  
	 
 
RPC has different failure modes
	- 
		PRO: Callee cannot trash caller's memory and vice versa (hard modularity).
	
 
	- 
		CON: Messages get lost.
	
 
	- 
		CON: Messages get corrupted.
	
 
	- 
		CON: Messages get duplicated.
	
 
	- 
		CON: The network can go down or be slow.
	
 
	- 
		CON: The server can go down or be slow.
	
 
 
What should a stub/wrapper do: 
	- 
		If corruption - resend
	
 
	- 
		If no response - possibilities are:
		
			- 
				Keep trying - at least once RPC (suitable for idempotent operations).
			
 
			- 
				Give up, return error - at most once RPC (suitable for transactional operations).
			
 
			- 
				Exactly once RPC (Holy Grail of RPC).
			
 
		 
	 
 
RPC examples: 
	- 
		HTTP client -> "GET /foo/bar.html HTTP\r\n"
 
		Server reponse -> "HTTP /1.1 200 OK\r\n"
	 
	- 
		SOAP (Simple Access Object Protocol)
	
 
	- 
		X - Remote screen display
  
		
			- 
				Works even if they are in the same machine.
			
 
			- 
				Use of higher level primitives (e.g. fillRectangle). 
			
 
		 
	 
 
Perfomance Issues with RPC
  
Solutions: 
	- 
		Have higher level primitives.
	
 
	- 
		Asynchronous RPC - better performance but can complicate caller
	
 
	- 
		Cache in caller (for simple stuff)
	
 
 
NFS (Network File System): File system built atop HTTP
The NFS protocol is like the UNIX file system but on wheels. 
	- 
		
LOOKUP (dirfh, name) // Request fh and attributes (size, owner, etc.) 
		fh = file handle, a unique id for a file within a file system
	 
	- 
		
CREATE (dirfh, name, attr) // Returns file handle and attributes
	 
	- 
		
REMOVE (dirfh, name) // Returns status
	 
	- 
		
READ (fh, size, offset) // Returns data
	 
	- 
		
WRITE (fh, size, offset, data) // Returns status
	 
 
We want our NFS to be reliable even if the file server reboots.
  
 
"Stateless Server
  
	- 
		Whenever the client does a write, it has to wait for a response before continuing.
	
 
	- 
		NFS server cant' respond to a write request until data hits disk.
	
 
	- 
		NFS will be slow for writes because it forces writes to be synchronous.
	
 
	- 
		To fix this problem we "cheat":
		
			- 
				Use flash on the server to store pending write requests.
			
 
			- 
				Writes don't really wait for the server to respond, if a write fails, a later "close" will fail.
 
				Can use "fsync" (written all data) and "fdatasync" (written all data to disk) to make sure data is written. But these operations slow performance.
			 
		 
		 
	 
	- 
		In general, most clients won't see a consistent state.
	
 
	- 
		NFS by design doesn't have read/write consistency (for performance reasons).
	
 
	- 
		It does have open/close consistency through fsync and fdatasync.
	
 
 
 |