CS 111 Lecture 16 Scribe Notes (Spring 2012)
Tuesday, May 29, 2012
Instructor: Professor Paul Eggert
Notes by Michael Jennings, Kevin Balaji, Jayveer Singh, and Alejandro Veloz
Media Faults
Table of Contents:
Raid
Disk Drive Reliability
Distributed Systems and RPC
NFS
Next time
We've learned how to recover from power failures, but how can we recover from disk failures?
Logging solution? (storing deltas)
This solution won't work, since disk failure may occur in an area you're not changing.
Instead many use a system called RAID.
RAID
- Stands for Redundant Array of Inexpensive/Independent Discs.
- Simulate a large drive with a number of smaller drives.
- RAID was originally used to save money.
- Now it is used to improve reliability
- Berkeley RAID Levels
- RAID 0
- Goal is to save money
- Simulate a big disk with a bunch of little disks
- No redundancy
- Disks are concatenated to form a large virtual drive.
- Disk IOs can be parallelized using disk striping
- Striping is putting separate parts of a file on separate drives
- Striping allows parallel reads of separate parts of a file
- Concatenation vs. Striping
- Striping is faster
- Concatenation is easier to add new drives
- Growing is easier with concatenation
- RAID 1: Mirroring
- Data is mirrored on a second disk, thus we have two physical disks for every virtual disk.
- Faster reads, since you can choose from two disk heads, which are in different positions.
- Writing is slower, since you have to write to both disks.
- Costs twice as much as a disk drive not using RAID
- RAID 4
- Has one special parity drive (E), and all writes are done so that the following property holds true:
- E = A^B^C^D if A, B, C, and D are the data drives.
- Reads are like RAID 0: Concatenation
- A negative is worse read performance than RAID 0 with striping
- Writes like RAID 1
- A negative is that we read E before writing it
- Positive is that if one disk crashes, we can reconstruct it using C' = E^A^B^D
- All drives are exclusive or's of the others
- Costs 1.25 times more if we didn't use RAID
- RAID 5
- Same as RAID 4 but with striping instead of concatenation.
Disk Drive Reliability
- MTTF: Mean time to failure
-
Typically about 300,000 hours or 34 years
-
Though it's really about 5 years
- Probability for the disk drive to fail in the first 5 years
Distributed Systems and RPC
- RPC = Remote Procedure Call
- Sending arguments to another server and waiting for a response
-
RPC is similar to system calls
-
Though RPC differs from ordinary function calls and kernel system calls
-
Caller and callee do not share address space
- Cannot pass addresses (pointers)
- Callee can't trash caller's memory and vice-versa (Hard Modularity)
-
Caller and Callee may have different architectures
-
Little Endian vs. Big Endian
-
Allows us to mix and match architectures, but had the performance disadvantage of marshalling.
-
Several disadvantages associated with having to use the network:
- Messages can get lost or corrupted
- Network can go down, or be slow
- Server can go down, or be slow
-
What should a stub/wrapper do, if corrupted?
- Resend RPC
-
If there's no response, possible solutions are:
-
At Least Once RPC - Keep trying until a response is given. The operation can be processed twice.
- Suitable for when it is okay to process the operation more than once
-
At Most Once RPC - Give up and return an error.
- Suitable for transactional operations
-
Exactly Once RPC - This is preferred, it is when only a single request is sent and then executed
-
RPC Examples
- HTTP Client -> “GET/foo/bar html HTTP\r\n”
-
“HTTP/1,1 200 OK\r\n_____<-- Server
- SOAP (Simple Object access protocol)
-
Peformance Issues of RPC
-
Solutions
-
Higher level primites (larger commands)
-
Asynchronous RPC
-
+ Better Performance
-
- Can complicate the calling code
-
SERIALLY
-
Drawpixel(x,y,color)
-
Stub lies and returns as if the request has been processed.
-
Stub tracks all requests and builds a model.
-
ASYNCHRONOUSLY
-
req = drawpixel(x,y,color)
-
waitfor(req)
-
Makes a request and then waits for a response.
-
Cache in caller (for simpler functions)
NFS
-
NFS stands for Network File System
- Basically it is a file system that resides on a network instead of on your computer drive
-
NFS protocol ~ Unix file system on wheels
-
LOOKUP (dirfg, name) -> fg +attrs
-
fg = file handle
- Unique ID for a file within a file system
-
CREATE(dirfh, name, attr) -> fh + attrs
-
REMOVE(dirfh,name,status) -> return status
- READ (fh, size, offset) -> date
- WRITE(fh,size,offset,data) -> status
-
-
We want NFS to be reliable
- Even if file server reboots, it shouldn't crash
- In particular a client should be able to look up a file, and get the file handle.
- If NFS crashes we should be able to reboot and still do a read!
- Client will keep reissuing the read, because we are assuming “at least once RPC”. Eventually the system will be back up and we will get a response back!
-
This process will only work if we have a “stateless” server
- When the client does a write, it has to wait for a response before continuing.
- The NFS server can't respond to a write request until data hits disk.
- Thus NFS will be slow for writes
-
To fix this problem we “cheat”
- Use flash on server to store pending write requests
- Writes don't really wait for server response (if a write fails, a later ‘close' will fail)
- In general multiple clients will not see a consistent state. NFS does not have read and write consistency
- It also does not have open and close consistency
Next time:
-
What does SA performance look like
- Basically means NFS can run faster than a local disk
-
Second Issue: Security