Lecture 6 - OS Organization Revisited

By Richard Sun and Michael Li

Processes and Files

Processes and files are all "fakes". They appear to have a real machine to themselves, but the operating system generally has control of the process.

The major resources needed to implement a "pretend" computer for a process with files are:

The ALU, registers, and memory are more like the real machine because they use machine instructions. On the other hand, I/O is abstracted by a system call interface.

Process Table

The kernel has to keep track of processes so that it can allocate CPU time and memory, and let the processes think that they have a real machine.

The kernel memory contains a process table, which is an array of process descriptors.

A process descriptor stores the following information:

The process descriptor does not remember the state of the ALU. It just throws away incomplete calculations instead.

Handles for file descriptors

The OS needs a way to remember which files a process has open.

In Linux/Unix, an integer is used as a handle for an open file. The advantage of this approach is that it adds a layer of indirection, so the OS has more control over the file descriptors and can perform some optimizations.

In other systems, a pointer (eg. struct filedes*) might be used as a handle. This approach has better performance, programmers get direct access to the file descriptor, and the compiler can enforce type checking.

However, using a pointer is less portable because the file descriptor data structure may be different depending on the system. This approach is also not orthogonal because the implementation of file descriptors influences how applications have to be written.

Pipes

Pipes are a way to send data between processes. A pipe has two file descriptors: one for writing to the pipe and one for reading from it. The data written to a pipe is stored in a bounded buffer and is deleted from the buffer when it is read.

What can go wrong here?

Infinite waiting problem

Because the way pipes are handled depend on if there are any readers or writers, processes will suspend indefinitely if another process holds onto a copy of a file descriptor but does not use it. In order to deal with this, a parent process and its children have to close the pipe ends that it is not using.

Named pipes

In Linux, there are also named pipes, which are pipes that are actually a file. Using named pipes, processes can communicate to other processes that are not a parent or child.

$ mkfifo /tmp/pipe          # make a file that is a pipe
$ cat /tmp/pipe > out &     # cat hangs, trying to read from the pipe
$ echo "Hello" > /tmp/pipe  # writes to the pipe and unblocks cat

More things that could go wrong

Orthogonality

Running (rm bigfile; grep interesting) < bigfile will have output if bigfile contains 'interesting' since file descriptors access files at a lower level than the file names; a file is orthogonal to its name. rm simply removes the name, but doesn't delete the data on disk since grep is accessing the file descriptor; a file won't be removed until all file descriptors pointing to it go away (just like pipe()). The OS keeps files around until no more readers are interested.

Signals

Why do we use signals when they are so much trouble?

Receiving signals

To catch a specific signal, we call the signal function: sighandler_t signal(int signum, sighandler_t handler);. signum is the signal number, and handler is the function that will run when the signal is caught. sighandler_t is defined as typedef void (*sighandler_t)(int);, which is a function that takes an integer and returns void. handler is the new signal handler that will get run, and the function returns the old handler.

This new signal handler can be run at any time in the middle of the rest of the program. For example, if your code looks like:

signal(29, handlerFunc);
x = y + 1;
z = w + x;

The signal handler can be run in the middle of the instructions that add w and x, since the program can trigger an interrupt and run the signal handler between every pair of assembly instructions, such as the load and add instructions. This means that the signal handler can potentially modify variables and create race conditions.

Signal handler example

We can write gzip, a program that compresses a directory, with a signal handler so that if the user interrupts the program, the program will delete the compressed directory that it started to make. For example, $ gzip foo creates foo.gz, and if the program is interrupted, foo.gz should be deleted.

With the following code, foo.gz will remain if the program is interrupted:

int fd = open("foo", O_RDONLY);
int fo = open("foo.gz", O_WRONLY | O_CREAT);
while (compress(fd, fo))
    continue;
close(fd);
close(fo);
unlink("foo"); // delete foo at the end

We can attempt to add a signal handler like so:

int fd = open("foo", O_RDONLY);
signal(SIGINT, cleanup);
int fo = open("foo.gz", O_WRONLY | O_CREAT);
while (compress(fd, fo))
    continue;
close(fd);
close(fo);
unlink("foo"); // delete foo at the end

...

static void cleanup(int sig) {
    unlink("foo.gz"); // delete foo.gz during cleanup
    _exit(1);
}

However, the second line introduces a race condition: if the signal handler is called right before foo.gz is opened in the third line, the program will attempt to delete foo.gz before it is even created. This means we should move the call to signal() to the third line:

int fd = open("foo", O_RDONLY);
int fo = open("foo.gz", O_WRONLY | O_CREAT);
signal(SIGINT, cleanup);
while (compress(fd, fo))
    continue;
close(fd);
close(fo);
unlink("foo"); // delete foo at the end

...

static void cleanup(int sig) {
    unlink("foo.gz"); // delete foo.gz during cleanup
    _exit(1);
}

This is better, but it's possible that the user can interrupt the program after it finishes writing to foo.gz, in which case cleanup() will delete foo.gz, which isn't what we want. We can solve this by setting the SIGINT signal back to its default behavior after we close fd; putting SIG_DFL as the handler argument in signal() enables the signal number's default behavior.

int fd = open("foo", O_RDONLY);
int fo = open("foo.gz", O_WRONLY | O_CREAT);
signal(SIGINT, cleanup);
while (compress(fd, fo))
    continue;
close(fd);
close(fo);
signal(SIGINT, SIG_DFL);
unlink("foo"); // delete foo at the end

...

static void cleanup(int sig) {
    unlink("foo.gz"); // delete foo.gz during cleanup
    _exit(1);
}

There's one last problem though: if the SIGINT signal is sent right before unlink("foo"), foo is left behind and foo.gz is too, since we're now using the default behavior for SIGINT instead of calling cleanup(). The solution to this issue is presented in the next lecture.