Lecture 16: Robustness, parallelism, and NFS

David Yang

Contents:

Potential Failures in Flash Drives

Flash storage drives are vulnerable to failures in a couple of ways. In a paper written by Mai Zheng et al, a number of failure points for drives were investigated:


Solutions to Drive Failures

Flash and spinning media alike are vulnerable to faults. A few things can be done to remedy this:

RAID

RAID stands for Redundant Array of Independent Disks, and it is a method for combining disks to improve reliability, performance or create larger effective drives. Different types of RAID include:

RAID can be implemented in either hardware or software. Hardware RAID has the benefit of being designed by the makers of the drive, which allow for very hardware-specific optimizations. On the other hand, physical RAID controllers can be expensive and only work with a certain brands of hard drive. In contrast, software RAID can be less efficient and low-level than hardware RAID, but is cheaper.

The following graph illustrates the reliability of data storage under different conditions:

Chart showing failure rates for different environments

The above graph illustrates three cases:


Potential Failures in Remote Procedure Calls (RPC)

Commands such as HTTP GET, which requests webpages, and X draw, which sends pixel draw requests to the X window manager, are remote procedure calls. They send a signal to an external server of some kind requesting that a task be done, and the server fulfills that request. Sometimes, the server fails to carry out a remote procedure call (RPC) and it can be for a few reasons:

Errors from the above can manifest in a few ways. The call could time out without a reply, or hang forever. When this happens, a policy is needed for how to deal with no response.


Workarounds to Remote Procedure Call Failures

When a remote procedure call fails, a number of policies can be taken to remedy it. The simplest is to just resend the procedure call whenever it fails. Doing this indiscriminantly may result in unexpected behavior however; maintaining a strict policy for how to deal with errors can lead to better behavior for your specific application. For applications where duplicate messages are not backbreaking like a display server like X where it is fine to say that you want to draw a red pixel at a certain location twice, it is acceptable to resend whenever an error occurs. This is called at-least-once-RPC. In a bank, it is not acceptable to send a widthdrawal request twice, as a "failed" request could very well be caused by the bank server being slow, or the transaction confirmation being lost. In these cases, the at-most-once-RPC policy is employed, where you only ever send the procedure call once because you don't want to widthdraw twice as much from a user's bank account. Finally, there is exactly-once-RPC, for applications where it is essential that a procedure call is only ever called once. Exactly-once-RPC can be expensive to implement.

A couple of other measures can be taken to improve the efficiency of RPCs. A higher-level API can reduce the amount of calls required to perform a task. In the case of the X server, we can use calls to draw full windows and shapes instead of drawing pixels one-by-one. Programs can also use pipelining to improve the speed of RPCs. Instead of sending a call and waiting on a response, the client program can instead send multiple calls in quick succession for the server to work on at once. This improves efficiency as the client program doesn't need to waste time waiting for responses between each call, but the it must also be able to handle error messages from RPC that were sent several requests ago, which means it has to keep track of them somehow. A client program can also cache the server state, using what it knows about the calls it has already sent to build a local model of what is remotely stored so that the client can make calls ahead of time based on those assumptions. This is a gain in performance, but inconsistencies can arise between what the server remembers and what the client caches if another client program attempts to modify the same data, or the server performs hidden operations on data sent by the client. Caching the server state also requires much more complex client code.


NFS

NFS stands for Network File System. It is a system for treating remote servers as storage drives, so that administrators can back up a single server's worth of drives instead of having to maintain drives on hundreds of workstations in a building. To a local program, files on an NFS drive are accessed in the same way as on a regular UNIX file system. The program uses a file mounter to turn a regular file command such as open("/home/me/file", READONLY) and turns it into a form that the NFS server can recognise. The following are a few sample NFS calls:

lookup(dirfh, name) - The NFS server looks up the name of the file and returns the file header.

remove(dirfh, name) - The NFS server looks up the name of the file and deletes it, returning a success status.

create(dirfh, name, attr) - The server creates a file in the directory indicated by dirfh, with the attributes indicated by attr.

read(fh, offset, count) - Server finds the file indicated by fh (file header), starts at the specified offset within the file and reads off a number of bytes equal to count, returning the read data.

A more complete list can be found here. dirfh is the current working directory on the NFS server. The file header is composed of the filesystem number and the inode number appended to each other, and is used to identify files on the NFS server. NFS is a stateless server, which means that unlike UNIX, the server does not remember what directory you are in, it doesn't lock files and it doesn't keep track of things like the file pointer for clients. Instead, the client must keep track of everything. When a program runs open() on a file, it saves the returned file header locally and saves that as the file header for the file. NFS uses file headers instead of file names to reference files to avoid race conditions with multiple clients; consider the following:

Client 1: rename(foo, bar), rename(baz, foo)

Client 2: write(foo, buffer, n, data)

In the example above, client 1 renames file foo to bar, then renames file baz to foo. In the meantime, client 2 sends a request to write data to file foo. If NFS referred to files by their names, the write command would behave differently depending on how many of client 1's requests have already been processed. If client 1's first rename has not run yet, the write will go the the correct file. If the first rename has been executed, the write will go to an empty file and return an error. If both renames have already been processed, the write will write to the file formerly known as baz. Because of this, NFS uses numerical file headers based on inode numbers that do not change when the filenames change, so that if one client runs a rename operation on a file while another client is working on it, the second client does not accidentally destroy data or cause inconsistencies.