Scribe notes brought to you by Alina Chen & Matthew DeCoste

The Big Idea of Unix: the File Model

Table of Contents

What can go wrong with reading and writing files
Creating temporary files
Locking Files
Pipes a | b
Interrupts and Signals
Summary

File: an array or stream of bytes
Files are represented by file descriptors.
You can read from it (e.g., read(fd,buf,sizeofbuf))
Apparently read isn't used enough, but let's move on.

Pros: Cons:
  • simple
  • not everything is conveniently modeled as a file
  • orthogonal: doesn't interfere with other structures
  • e.g., keyboard:
    • read(kb,buf,sizeofbuf) makes sense - read bytes from what was typed & put into buffer
    • lseek(kb,10000,SEEK_SET) doesn't make sense - position the offset of the keyboard 10000 bytes?
  • As a result, there are 2 kinds of files:

    Stream-oriented (e.g., I/O, keyboard) Random access: (e.g, disks)
  • spontaneous data generation
  • request/response approach
  • infinite input
  • local storage
  • lseek doesn't work here
  • finite capacity
  • The read function works on either type of file.

    What can go wrong with reading and writing files

  • Writing to a closed file descriptor:
  • Example code What happens
    close(1);
    int i = write(1,"x",1);
    if (i < 0)
    print(strerror(errno));
  • write fails, returns -1 to i
  • strerror(errno); stores details

  • Reading from a closed file descriptor (closing, opening, then reading):
  • Example code What happens
    int fd = open("foo", O_RDONLY);
    close(fd);
    int fd1 = open("bar", O_RDONLY);
    read(fd, buf, sizeofbuf);
  • read is actually reading from bar
  • this is known as aliasing

  • File descriptor leaks:
  • Example code What happens
    1  for (i=0; i<N; i++)
    2  {
    3      int fd = open(file[i], O_RDONLY);
    4      if (fd < 0)
    5      {
    6          error();
    7          read_and_copy(fd);
    8      }
    9  }
  • file descriptors were never closed
  • your process will run out of resources/memory

  • Removable devices (USB flash drive):
  • Example code What happens
    int fd = open("/dev/usb/flash01", O_RDONLY);
    physically remove flash drive at this point in execution
    read(fd, buf, sizeofbuf);
  • read returns -1 with special errno (about lack of flash drive)

  • Accessing unlinked file (not visible in file system directory):
  • Example code What happens
    int fd = open("/tmp/foo", O_RDONLY);
    unlink("/tmp/foo");
    read(fd, buf, sizeofbuf);
  • this succeeds: the kernel still has the file open,
    even though it was removed from the file system
  • the file is still on disk, so it is still possible to read from it

  • Creating temporary files

    NOTE: multithreading will mess this all up so we're assuming this is a single-threaded process.

    Let's try using a system call...ideally it would work like this:

    int fd = mktempfile(); //file won't be visible to file system/directory
    ...
    close(fd); //reclaim memory

    ...Unfortunately, this system call doesn't actually exist.

    We can try using a library function instead, but it won't be safe.
    Here's a first version of the code:

    1  int mktmpfile(void)
    2  {
    3      int fd = open("/tmp/foo", O_RDWR | O_CREATE | O_TRUNC, 0600); //trunc replaces old files with this file name; 0600 is "-rw-r-----"
    4      if (fd >= 0) //as opposed to 0666 ("-rw-rw-rw-") due to security issues
    5          unlink("/tmp/foo");
    6      return fd;
    7  }

    The unsafe features are the O_TRUNC call (because it deletes any possible previous /tmp/foo file and the fact /tmp/foo does appear in the directory for a brief moment. This can cause a race condition with multiple function calls since the file would already be in existence during one of the calls.

    Knowing these issues, here's a safer version of the code:

    1  int mktempfile(void)
    2  {
    3      struct stat st;
    4      if (stat("/tmp/foo", &st) == 0)
    5      {
    6          errno = EEXIST; //EEXIST means the file already exists
    7          return -1;
    8      }
    9      char foo[sizeof "/tmp/foo" + sizeof(pid_t)*8];
    10     sprintf(foo, "/tmp/foo%lld", (long long) getpid()); //attaches pid to name of temporary file
    11     int fd = open ("/tmp/foo", O_RDWR | O_CREATE | O_TRUNC, 0600);
    12     if (0 <= fd) //code in the real world will leave this line out to optimize speed
    13         unlink("foo");
    14     return fd;
    15 }

    The race condition still exists between the first condition checking and the rest of the code, but it is narrower than before. The code also attaches the pid to the name of the file, which means multiple processes can call this function. Also, processes can create more than 1 temporary file even though they have the same pid because the name is unlinked after each file is created.
    However, there is a problem, from the point of operations. Since the files are "invisible," we cannot see where the disk space is allocated to after unlinking. This means you don't know how much disk space you have left.

    How is this relevant? The scenario given in class was something like this: You're writing a research paper while this function is being called multiple times in the background. When you finally click the Save button, a popup appears stating "No more disk space," although your directory hasn't changed from when you first started this paper. This occurs because the invisible files exist in your directory, and thus are taking up space, even if you can't see them.

    Well, we can think of a solution to that. We remove the unlink code, so now all temporary files are visible.

    Proceed to remove lines 12-13:

        if (0 <= fd)
            unlink("foo");
    from the previous code.

    Well now, what wasn't a problem before is now a problem! A process now cannot have multiple files, because they will have the same name.

    Let's try fixing this...how about a random name generator?

    1  int mktempfile(char *name, size_t size)
    2  {
    3      struct stat st;
    4      do {
    5          generate_random_file_name(name, size); // doesn't exist, but you can create it pretty easily
    6      } while(stat(name, &st) == 0);
    7      int fd = open(name, O_CREAT | O_RDWR | O_TRUNC, 0600);
    8      return fd;
    9  }

    Welp. Now we have the same race condition as our first attempt...again. This time they appear during the while loop condition and open function call (line 6-7).

    Our current set of primitives can't solve this. There's this new flag called O_EXCL. Basically, if the file exists, the call will fail.

    We'll also replace the while loop with a different one so that we can avoid this failed call scenario.

    1  int mktempfile(char *name, size_t size)
    2  {
    3      struct stat st;
    4      int fd;
    5      do {
    6          generate_random_file_name(name, size);
    7      } while((fd = open(name, O_CREAT | O_RDWR | O_EXCL, 0600)) < 0 && errno == EEXIST); // can also check other errnos for this loop
    8      return fd; // we can unlink later
    9  }

    As you can see, it's difficult to find a robust solution, due to the waterbed effect - when you fix 1 problem, another appears in a different area.

    There might be another way to solve this, but it's tricky and not generally used for this type of problem. In any case, this system is called locks. Here's a brief section about it.

    Locking files

    Locking primitives exist in UNIX/POSIX, but are advisory and rarely used. Shells don't use these. Databases do. If you control almost everything, you can use this.

    Use fcntl(fd, cmd, p), where cmd is replaced by:

    The placeholder p is for: There's more locks. These are called leases, where the process holding the lease (use F_SETLEASE(arg) to set it) is notified when a file tries to open or change the file size (truncate). The arg can be one of the following: The rest are on the man page.

    Pipes a | b

    "It-sa me! Mario!"

    Pipes are bounded buffers. If they were unbounded, they could potentially take up the whole disk space.

    Pros: Cons:
  • a and b run in parallel
  • no checkpoints or restarts
  • no space required to hold temporary data
  • multiple read-ins are a pain
  • better caching: doesn't need to go into disk (by redirection) and then be pulled off later
  • b can't use lseek
  • Here's what they might look like in a diagram:

    Another thing to note: redirection (< >) precedence is higher than pipe precedence.

    What can go wrong when using a pipe

    Scenario: What happens:
    b reads an empty pipe
  • b waits
  • a tries to write to full pipe
  • a waits
  • a writes, but b closed its end of the pipe
  • any of the following is possible:
    • write returns -1, errno is set to ESPIPE
    • default: a gets a SIGPIPE signal: works for lazy programs
    • non-default: some applications will use up CPU if written in a loop (e.g., while(...) printf("error"); - printf keeps failing
  • b reads, but a closed its end of the pipe
  • any of the following is possible:
    • read returns 0 (EOF)
    • this is normal behavior - it's how the closed read signifies that read is done
  • a is done generating output, but forgets to close its end of the pipe
  • read hangs forever
  • There are 3 configurations we can consider a pipe a|b is implemented in the shell. They are described by the tree diagrams below. Each node is the parent of the one below it.

    Pipes are implemented something like this, which rules out the first possibility.

    1  int fd[2];
    2  int ap = fork();
    3  pipe(fd);
    4  int bp = fork();

    Also, we want the exit status of the pipe to be set to the exit status of b. Doing this with the 2nd possible configuration would be too much work, so we use the 3rd configuration. The rewritten code looks like this:

    1  int fd[2];
    2  int bp = fork();
    3  if (bp == 0) //child
    4  {
    5      pipe(fd);
    6      int ap = fork();
    7      if (ap == 0) //grandchild
    8          ...run a...; //clone fd[1] into STDOUT (1)
    9      else
    10         ...run b...; //clone fd[0] into STDIN (0)

    Use dup2(i,j) to clone. Remember to close your pipes after you're done using them.
    For example, if a exits and b reads(0,...), then the program can hang if b forgets to close fd[1].

    Interrupts and Signals

    Signals are ways to force code to respond to run-time situations.

    Here's an example: you're up late one night, writing some last-second code, when the power goes out. Luckily, your desktop is attached to a UPS, so it hasn't completely lost power yet. Unfortunately, your electron-guzzler of a PC will drain the UPS in a matter of minutes. Ideally, this is where your computer comes in to save the day. The kernel needs to take a snapshot of RAM and store it on disk (for recovery after power is restored). When the power does eventually return, you would hope the kernel would restore everything to its rightful place, and that your various programs (and precious code) will return in exactly the same state as before.

    So, will it? Well, not exactly. Obviously, network conditions and connections will not be the same upon revival, which could mess up some programs (cough EA games cough). There are also security implications if too much sensitive information is restored onto the screen in a new, insecure context. However, it's not the kernel's responsibility to address each and every possible consequence of this necessary action. At some point, user processes must be informed of the issue and allowed to take appropriate action.

    So how do you alert user processes?

    Essentially the question is how to place new code into the already-existing user code to make it handle a new situation. There are two ways to do this:

    Pulling Solution (continuously reading from status file) Pushing Solution (receiving, handling signals)
  • Continuously reads from file. May run code depending on load
  • Handles signals sent to it by kernel with signal_handler(int sig)
  • Very inefficient: continuous loads by programs interested in status
  • Efficient: only runs extra code when told to
  • Pain to program repetitive loads
  • Pain to keep track of possible race conditions
  • Eats up CPU time
  • Less taxing on CPU
  • Signals are essentially like the kernel inserted the pulling (polling) check for you, but only when the value is actually of use.

    Summary

    › A file is an array or stream of bytes. They are represented by integer identifiers called file descriptors. Files can be either stream-oriented (like receiving input from the keyboard) and random access (like those read off disk).
    › Creating a temporary file is a process that must be handled with care to account for a variety of possible errors.
    › Files can be locked, but it is purely a suggestive system that is not really enforced.
    › Pipes are bounded buffers that allow you to redirect output from one process into the input of another.
    › Interrupts are used to alert code to a change that urgently requires its attention.