CS 111
Lecture 16 (5/29/12)
Abimael Arevalo
Media Faults
Disk-fails: solvable via logging
If an unchanged sector fails, the data cannot be reconstructed using the journal.
RAID (Redundant Array of Independent Disks)
Simulate a large drive with a lot of little drives.
Save money and gain reliability.
RAID levels a la Berkeley
RAID 0: no redundancy, just a simulated big disk.
Concatenation: performance of the virtual disk is roughly that of a physical disk.
Striping: Divide A, B, C, and D into pieces and places the pieces into the drives. Each drive can be run in parallel to extract the data. Virtual disk performance is roughly four times faster than a physical disk.
Growing is easier in concatenation than in striping.
RAID 1: Mirroring
Write to both drives.
Read from either (can pick the closest disk head).
ASSUMPTION: reads can detect faults.
RAID 2,3,4,5,6,7,...
Reads are like RAID 0 concatenation and has worse read performance than RAID 0 striping.
Writes are like RAID 1 (need to read drive E before writing).
If C fails: C = A^B^E^D ('^' = XOR)
Disk Drive Reliability
Mean time to failure is (typically) 300,000 hours (34 years), but in reality, drives get replaced every 5 years.
Probability distribution function for single disk failure.
Probability distribution function for RAID 4 (never replace drives).
Probability distribution function for RAID 4 (assuming failed disks are replaced).
Disk fails.
(60 minutes later) operator replaces it.
(8 hours later) rebuilding phase. Depends on drive size.
RAID schemes can be nested.
Q: Does RAID make backups obsolete?
A: No, we still need backups for user errors.
Distributed Systems
RPC (Remote Procedure Calls) vs. System Calls and Function Calls
Caller sees:
x = fft (buf, n);
send (buf, n); // to server
// Wait for response
Caller and callee do not share address space. There is no call by reference (at least, not efficiently).
Caller and callee may be different architectures (ARM vs. SPARC or little vs. big endian).
Requires conversion:
RPC has different failure modes
PRO: Callee cannot trash caller's memory and vice versa (hard modularity).
CON: Messages get lost.
CON: Messages get corrupted.
CON: Messages get duplicated.
CON: The network can go down or be slow.
CON: The server can go down or be slow.
What should a stub/wrapper do:
If corruption - resend
If no response - possibilities are:
Keep trying - at least once RPC (suitable for idempotent operations).
Give up, return error - at most once RPC (suitable for transactional operations).
Exactly once RPC (Holy Grail of RPC).
RPC examples:
HTTP client -> "GET /foo/bar.html HTTP\r\n"
Server reponse -> "HTTP /1.1 200 OK\r\n"
SOAP (Simple Access Object Protocol)
X - Remote screen display
Works even if they are in the same machine.
Use of higher level primitives (e.g. fillRectangle).
Perfomance Issues with RPC

Have higher level primitives.
Asynchronous RPC - better performance but can complicate caller
Cache in caller (for simple stuff)
NFS (Network File System): File system built atop HTTP
The NFS protocol is like the UNIX file system but on wheels.
LOOKUP (dirfh, name) // Request fh and attributes (size, owner, etc.)
fh = file handle, a unique id for a file within a file system
CREATE (dirfh, name, attr) // Returns file handle and attributes
REMOVE (dirfh, name) // Returns status
READ (fh, size, offset) // Returns data
WRITE (fh, size, offset, data) // Returns status
We want our NFS to be reliable even if the file server reboots.
"Stateless Server

Whenever the client does a write, it has to wait for a response before continuing.
NFS server cant' respond to a write request until data hits disk.
NFS will be slow for writes because it forces writes to be synchronous.
To fix this problem we "cheat":
Use flash on the server to store pending write requests.
Writes don't really wait for the server to respond, if a write fails, a later "close" will fail.
Can use "fsync" (written all data) and "fdatasync" (written all data to disk) to make sure data is written. But these operations slow performance.
In general, most clients won't see a consistent state.
NFS by design doesn't have read/write consistency (for performance reasons).
It does have open/close consistency through fsync and fdatasync.