RPCs and NFS

Chris Dickens, Matt Sperry, and Andre Song

March 10, 2008

Instead of splitting the machine, take a natural approach with a hardware-influenced modularization, one that is suggested by the underlying network. This is nice because it fits with what you're doing.

1 Remote Procedure Calls

Remote procedure calls are to distributed systems what system calls are to a single system. Put shortly, RPCs are like syscalls, but they go to another host, not the kernel.

1.1 Properties

Caller and callee don't share address space (this is good because there is no extra work to achieve hard modularity)
No call by reference (because callee can't dereference pointers)
Performance is an issue: for example, copying large objects
Callee and caller could have different architecture
Marshalling (generates a serial representation of a complicated data structure and sends it) (also known as pickling or serialization)
Unmarshalling (creates complicated structures from a serial form)
Stubs: A small procedure on the client which marshals the arguments, ships it to the server, gets a marshalled response, unmarshals the file, and returns the data.

1.2 Example: HTTP

The client sends a string to the server, and the server replies with a header and a data file.

Client:
	`GET / HTTP/1.0\r\n`

Server:
	`HTTP/1.1 200 OK\r\n`
	`Content-type: text/html\r\n`
	`Content-length: 10243\r\n`
	`\n\n`
	Body <- HTML

1.3 Example: X Windows

Clients: Applications
Server: Your Desktop

Writing a pixel:

Client Actions
	send x
	send y
	send color
	read reply

Server Actions
	read x
	read y
	read color
	draw pixel
	write reply

Note that this is usually done via stubs, and also that it needs better efficiency.

2 RPC Issues

2.1 RPC Failure Modes

RPC's failure modes are different than that of the other systems we have discussed in this course.

Advantages
- The callee/server cannot corrupt the caller's/client's data structures. This is because all data is always copied from the client to the server and never passed by reference. It wouldn't be possible to pass by reference because since the client and server are different systems, they do not share the same hardware resources and thus one system cannot dereference a pointer passed by another.
Disadvantages
- Messages can get lost and/or corrupted during transmission from the client to the server.
- The network over which the systems communicate might be slow or down. From the client's point of view, both could seem like the same thing if waiting on a response from the server. The client has no way of knowing that the message successfully reached the server other than by receiving a response from the server, but with a slow or down network the client may abandon the call.
- The server may be slow or down.

To address the above disadvantages, which are really correctness issues, the following procedures can be implemented:

Ideally, what we want is an "Exactly-Once RPC," but achieving this is theoretically impossible if data can be lost or corrupted during a transmission.

2.2 RPC Performance Problems

Consider the following diagram where a single client is sending requests to a single server synchronously.

In the above diagram, there is a substantial amount of overhead in the client/server communication. The amount of "real work" done by the server in performing the requested operation is substantially smaller than the total amount of time needed for the client to send the request to the server and receive a response from the server. This scenario leads to very low utilization when there are many repeated operations that you know you want to do (e.g. read many blocks from a disk, color many pixels on the display).

To deal with this performance issue, we can

So if using asynchronous communication, what happens if an early request fails due to a transmission failure (lost or corrupted data) or the server rejecting the request? We can take a couple different courses of action.

Something to consider with asynchronous communication is that failure reports will become less accurate. Though the requests are sent in order by the client, there is no guarantee that the responses from the server will be in order as well. One way to address this issue is to have the next similar RPC as the one that failed send a failure message as well. This lets the client know that somewhere along the line a request failed, and the client must decide how to handle the situation.

Some ways to improve the performance and utilization of the RPCs is to use a cache or to pre-fetch. If multiple similar requests are made close together, the common answers to the requests can be cached. This will work very well to increase the speed of the operation but has its limitations. It will only work if the data in the operation is stable and not changing. In addition, implementing this caching feature increases the complexity of the system, requires the use of more physical memory for the cache, and relies on cache coherence. In the same manner, if multiple similar requests are made close together, the answer can be "pre-fetched" before the request is asked. Modern computers use pre-fetching in a variety of ways to improve performance.

3 Network File System

A network file system allows an operating system to write to and read from a network resource as if it were a disk directly connected to the computer.

The NFS system connects via the Virtual File System, and appears to the operating system as any other disk.

3.1 NFSv4

The Network File System protocol contains several I/O functions:

LOOKUP(dirfh, name)
CREATE(dirfh, name, attrs)
MKDIR
REMOVE
RMDIR
RENAME
READ
WRITE

It is not a coincidence that these represent a one-to-one correspondence with Unix system calls. The NFS system converts these calls into RPCs, which it uses to communicate with the server. Likewise, the RPCs sent from the server to the client are converted to system calls which operate on the local hardware.

3.2 NFS File Handles

NFS filehandles uniquely identify a file on a server so that the local operating system may read and write from them. Because of the tenuous nature of network connection, the NFS filehandle must remain connected to the file even through network connectivity losses and reboots of the server.

NFS filehandles take the following form: fh = device + inode# + inode_serial#. The device part tells the local machine which network device the resource is located on. The inode identifies the file on that device, and the serial number exists to prevent inode reuse.

In order to maintain reliability, the goal of the NFS system is to have a stateless server. If this were to be achieved, all operations would be reliable due to the atomicity of the server data. Some behaviors help this goal along. For example, if the server reboots, the data is consistent, and the filehandle remains correct. If the server stops responding to calls, the client will likewise freeze up to retain a measure of atomicity. And if the client freezes, the server goes about its business, because it is not receiving calls from that client.

Some interesting examples abound in consistency on NFS systems. For example, consider the following set of operations:

Client 1	Client 2
`fh = open("foo")`	`rename("foo", "bar")`

What will happen when Client 1 wants to access "foo"?

Renames are ok, because other processes can just use the filehandle they grabbed before. But removes are trickier, because the server doesn't know which clients have the file open. A read(fh) on a deleted file returns -ESTALE.

Consider a case in which network latency causes some clients to respond more slowly than others:

In this case, although Client A requested the contents of a file before they were changed, the server returns the contents after the write by Client B, because the network did not transmit Client A's request quickly enough.

In most cases, there is nothing that can be done to solve these problems, due to the unpredictable nature of the network. Short of caching all operations, then "inserting" operations into the timeframe and calculating changes due to that operation, there is not a good way to stop the problem from happening, so we may as well let it.

Locking over NFS is tricky because read/write locks on a server compromise statelessness. NFSv4 has locks, but not NFSv3.

4 Acknowledgements

NFS diagram and timing diagram taken from the scribe notes of the Fall 2006 CS111 class, by Subhash Arja, Matthew Ho, Grant Jenks, and Samuel Kwok.

File translated from T_EX by T_TH, version 3.80.
On 15 Mar 2008, 20:21.