Lecture 16

Professor Paul Eggert, March 11 2012

Notes compiled by: Ryan Farjadi and Paul Martinez

Sun ZFS storage 7320 Appliance (Release Scheduled: May 2012)

2 x Storage Controllers
2 x 10 Gb Ethernet Adapters
8 x 512 GB SSDs (read access)
8 x 73 GB SSDs (write access)
136 x 300GB, 15k rpm hard drives
37TB total exported capacity
32 File systems

SPECsfs2008_nfs.v3 (spec.org performance measurement)

Throughput
(ops/sec)
Response
(msec)
13316 0.9
26650 1.1
40031 1.4
53505 1.5
66877 1.5
80791 1.6
94472 1.7
107873 2.0
121160 2.2
134140 2.5
SFS Performance Graph

Problem: Out of Order Requests

write(fd, buf, 27);

write(fd, buf, 1000);

failed, but user doesn't know!

write(fd, buf, 96);

2 Solutions

  1. be slow; don’t pipeline; wait for response;
  2. be fast; pipeline; keep going; report errors on close (close is now slow)
    must call:
    	if (close(fd) != 0)
    		error();
    // otherwise code won't catch NFS errors.
    
    		

Issues with RPC

  1. + hard modularity
  2. - messages are delayed
  3. - messages can be lost
  4. - messages can be corrupted
    (a. Use checksums: if server detects a bad packet, send a response “?”, ask for a retransmission)
    			
  5. - network might be down or slow
  6. - server might be down or slow
    a. If no response from network or server:
    	-Try again: keep trying (at least once RPC)(ok for idempotent operations)(read or write)
    	-Return an error to caller (at most once RPC)(for “dangerous” operations)
    	-Exactly once RPC
    			

NFS Protocol (RFCs, NFS v2, v3, v4)

	READ (fh, data)
		fh = an integer (file inode in actual file system) uniquely identifying a file
	WRITE (fh, data)
	LOOKUP (fh, name) -> fh+ attribute
		fh: for a directory
		name = string
	REMOVE (fh, name)
	CREATE (fh, name, attribute) ->fh
	

Reliability

	Bad network: retry
	Bad client/server (operator power off)
	Bad disk (Media faults)
		Can we address via logging?
	

Redundant Aarray Inexpensive (independent) Disks (RAID)

	RAID 0
		Concatenation (or striping)
		Bigger virtual disk than physical
	RAID 1
		Mirroring:  reads are faster on average (can read both disks at the same time)
	RAID 4
		Can restore from XOR disk
			B=A^C^D^(A^B^C^D)
			*Must notify operator of disk failure in a noticeable way before data is lost
			 when 2nd disk fails (no little red light in the corner of the server room)
	Up to RAID 5 in original Berkeley paper