NFS is a protocol to share files over a network. A computer sends a request for a file to a server. The server then finds the file on its local disk and sends the contents back to the client.
nfs_open("name", ...)
FILE *f = fopen("name", ...)
Many parts of the NFS protocol look very similarly to Unix system calls:
LOOKUP(dirfh, name)
→ fh + attrsCREATE(dirfh, name)
→ fh + attrsMKDIR(dirfh, name)
→ fh + attrsREMOVE(dirfh, name)
→ statusRMDIR(dirfh, name)
→ status READ(fh)
→ status WRITE(fh)
→ statusIn these examples, "fh" stands for "file handle", and is the underlying file identifier. It is similar in concept to a Unix file descriptor in the sense that it is a unique identifier, but it has one important property: it is persistent. If a client accidentally disconnects during a session and then reconnects, the file handle it previously used is still valid.
Because of these properties, the most natural way of representing an NFS file handle appears to be a Unix inode, since these too are persistent and unique. There is one catch, however: an inode is a unique identifier on a physical filesystem, while an NFS server might have several different filesystems. Therefore, a better way to represent a file handle is as a (device, inode) pair, with a unique device ID for each physical filesystem and a unique inode for each file within. Although using the filesystem's direct inode number does have some security implications, the NFS driver is usually operating at the kernel level.
Because NFS is implemented over a network, it brings up nontrivial issues of concurrency, since several users on different computers may be attempting to access the same file at once. Additionally, because networks are unreliable, it is not guaranteed that every client's request will be processed.
In the diagram above, client A first opens the file, receiving a file handle. If client B decides to rename that file while A has it open, then that is no problem, since it doesn't change the (device, inode) identifier. The surprising fact, however, is that if client C decides to remove the file while A has it open and is reading from it, then that's allowed as well. On the next read operation, A will receive a "stale file descriptor" error: -ESTALE
. This is done for robustness, since the NFS server doesn't have to decide whether A is still connected or not before giving C permission to remove. If A were to be on an unreliable connection, this would be very difficult to determine. Because of this, the stateless - it doesn't keep track of any client information in memory, and RAM is only used for caching. There are no locks to worry about.
The system, being based on RPC, is much faster if requests and responses can come unordered without any handshaking involved. Additionally, on the server side, there is usually extensive caching involved, both on the client and on the server. For both of these reasons, read/write consistency is not guaranteed: the result of a read()
call may depend on the time and on who is issuing it. However, close-to-open consistency does work, but is slow; all relevant buffers need to be flushed to disk.
Because an NFS server is stateless, write caching is tricky. RAM doesn't work for this, since it introduces a state - if the server crashes when data is cached to RAM, then that data is lost. One solutioin is to use non-volatile RAM (NVRAM), which keeps its state on a power failure.
Because protocols such as NFS operate across different computers, some authentication scheme is needed. The traditional approach was to make sure that every user has same user ID on all clients and that all clients are trusted. The modern approach is to use an authentication scheme such as Kerberos, which performs user ID remapping.
In the real world, security defends against attacks via fource and fraud. Main forms of attacks via fraud are against: