CS111 Lecture 16 Scribe Notes
By: Eric Wei
Network File System
NFS utilizes a Client service based architecture
Performance
- benchmarks obtainable through www.spec.org 
 
- companies run older benchmarks like SPECsfs2008_nfs.v3 to obtain better results
 
- Example NFS - Sun ZFS storage 7320 Appliance (To be released May 2012) specs
 
- 2 storage controllers
 - 2 10 Gb Ethernet Adapters
 

- 8 512 GB SSDs (for read acceleration)
 - 8 73 GB SSDs (for write acceleration)
 - 136 300 GB 15 kRPM harddrives 
 - Split into 32 filesystems
 

- The system has no single point of failure because of redundancy, multiple copies of the same components
 
RPC is part of NFS

* 2 ms isn’t too bad, but we want to speed this up. How can we do this?
* Let’s do multiple reads at once
 
* If the threads are independent, this words well
* Web browsers basically use RPC
- originally, web browsers (client) issued requests to servers sequentially
 - now, web browsers issue multiple requests in parallel through HTTP pipelining
 
- This brings up new issues. The client must deal with failed out-of-order requests
 - Also, what if the client issues multiple writes and some of them fail? Here are 2 solutions:
 
1. be slow: don’t pipeline; wait for response
2. be fast: pipeline; keep going. Lie to the user about whether write() worked. Although, at some point, you need to fess up at report what really happened.
* Conventionally errors are reported on ‘close’
* ‘close’ now becomes slow because it needs to wait for all responses to come in, but files aren’t closed very often so this is usually acceptable.
* This is why you should always check the return value of close()!!! (since you only discover the truth then)
Issues with RPC
(+ = the good, - = the bad)
+ hard modularity (client and server have different address spaces)
- messages are delayed
- messages can be lost
- messages can be corrupted
- the network might be down, or slow
- the server might be down, or slow
How do you tell the difference between being down and being slow? (big issue)
* We can usually deal with corruption by using checksums (use them liberally)
If the server detects a bad packet, it should send a response “huh???” and ask for retransmit
* If no response, we have a few options:
- at-least once RPC - we try again and keep trying until it succeeds
 
- okay for idempotent operations (read/write)
 
- at-most once RPC - return an error to caller. Let the caller choose how to handle it.
 
- for “dangerous” operations (like changing the balance of a bank account)
 
- exactly-once RPC - do nothing
 
Robustness
* NFS assumes “stateless” server
- stateless - controller’s RAM doesn’t count as part of the state, so if the power cuts out, nothing vital is lost
 - RAM is cache only
 

- This is essentially mounting
 - NFS protocol goes over the wire
 
- READ(fh, data)
 - WRITE(fh, data)
 - LOOKUP(fh, name)
 - REMOVE(fh, name)
 - CREATE(fh, name, attr)
 
- fh = file handle
 - What is a file handle?
 
- an integer (actually a little more than than) uniquely identifying a file
 - these are like inodes in the actual file system
 
- To have the file system be fast, we need a module in the kernel that allows the file system to fiddle with files directly through via inode numbers 
 

- NFS does not guarantee write-to-read consistancy
 - It does guarantee close-to-open consistency (because close is much slower)
 
Reliability
Main issues
- bad network
 - bad client (operator powers off machine)
 - bad server
 - bad disk (Media Faults)
 
Let’s focus of Media Faults
- can we address this issue via logging?
 
- no, because the journal used for logging could be corrupted
 
- RAID( Redundant Arrays Inexpensive Independent Disks)
 
- the original purpose of RAID was to get a bunch of cheap, smaller disks to act like a larger disk because disk makers were overpricing larger disks (i.e. A 1 MB disk would be $100 but a 5 MB disk would be $2000)
 - nowadays, key feature from RAID stems from the R (Redundant)
 
- The various flavors of RAID
 
- RAID 0 - concatenation
 
- make a larger virtual disk by stringing together a bunch of smaller disks
 

- RAID 1 - mirror
 
- multiple physical drives for a single virtual one
 - reads are faster
 

- Striping - a combination of RAID 0 and RAID 1
 
- overlapping regions of virtual memory across the physical disks
 

- There are more types of RAID, but we’re going to focus on RAID 4
 

- XOR disk is a bit parity of the other disks which allows data on another disk to be recovered if it fails
 
- example: if disk B dies, to resort the bits on B, we use the following equation
 
- B = A ^ C ^ D ^ (A ^ B ^ C ^ D)
 
- you can lose any single disk and still run, but if you lose 2 disks, you won’t be able to recover their data anymore so MAKE SURE THE SERVER GUYS GET NOTIFIED IF A DISK FAILS
 - Of all the drives, the XOR drive is the busiest
 
- every write to any of the other disks = a write to the XOR drive as well