CS 111

Lecture 6 Scribe Notes

Signals, Signal Processing, and Threads

by Thomas Lutton & Nicholas Yee

10/22/14

What can go wrong?

(cont'd discussion of the hazards of file descriptors from the end of lecture 5)

File descriptors:

Many race conditions can originate from how we utilize file descriptors. Below are a few examples of some error causing behaviour and the accompanying result.

close(-1) [some time passes] close(39) == 1 errno == EBADF

In this example, close is called twice on the same process id. close(-1) closes any open process which could have included process 39. The subsequent close call would return in error and set the errno flag to EBADF.

fd = open(...) read(fd)... fd leak

This example forgets an important part of file reading... Closing the file when you're done with it. By not closing the file, there is an fd leak. File descriptor leaks are bad because the computer has a limited amount of file descriptors. Like memory, if you allow it to leak, you could eventually run out.

fd = open("/dev/usb/dr1"...) read(fd, ...) *unplug* read(fd, ...) // return -1

As you can see from the path within the above open call, we're opening a USB device. What happens if the USB key is unplugged and then you try and read from it again? read will return -1 and your program may not have implemented a graceful recovery tactic from this situation.

There are many more errors that can be caused by improper usage of file descriptors. It is important to remember that file descriptors, much like memory, are a limited and tricky resource and certain procedures must be followed in order to write correct and safe code.

Race Conditons:

(cat a & cat b) > outfile

In this example, the order that the cats will be run is unclear. Any of a variety of possible outputs could occur (as seen below). It is possible that only one cat will write to the file or that both will attempt to write at the same time and cause interleaving. Both of these outcomes are unacceptable. Another possible output is that the writes for each cat will be run to completion, but there is no guarantee which will occur first.

This gives us many possible outputs:

    a\n    (bad: output discarded)
    b\n    (bad: output discarded)
    a\nb\n (ok)
    b\na\n (ok)
    ab\n\n (bad: interleaved)

It should be noted that "small writes" (i.e. <= 2048 bytes) are done atomically while writes with a large output have the potential to be interleaved. This would not happen in our simple example, but could become a possibility in other applications.

(cat > a & cat > b) < infile

This example is the same as the above example, but in the opposite direction. Since the two cat calls are sharing an input file, there is a race condition as to which cat gets which part of the input. It is possible that each cat gets a portion which would create an unnaccpetably interleaved output across the two files a and b. It is also possible that one file (i.e. a) will get the entirety of infile.

    |a| + |b| = |infile|

(cat a & cat b) | (cat > c & cat > d)

This would create a lot of race conditions since the behavior largely depends on the timing which affects both of the above examples. The subshell on the left side of the pipe has a race condition because there is no guarantee which cat will output to the input end of the pipe. On the other hand, the cats in the subshell on the right side of the pipe fight each other to pull bytes from the output end of the pipe. Timing is innately variable and therefore very difficult to debug. The best way to debug a race condition is to not create one. In this lecture we will be examining ways of writing code in order to avoid race conditions and write safe code.

Task: rotate a log file

Our goal is to keep a log of all of Apache's activity. In order to do this, we must keep two files, log and oldlog. log contains all of the information for the current day while oldlog contains a copy of the log file from the previous day.

log <= apache writes to this endlessly
oldlog <= yesterday's log

The tricky part (most prone to error) occurs when log must be transitioned to oldlog at midnight.

    $ mv log old log
    $ >log

However, we need some way of telling Apache to close/reopen its log file.

i.e.
    close(fd);
    fd = open("log", O_WRONLY ...);

Right now, our writes to the log look like this:

    write(fd, "good stuff\n", 11);
    write(fd, "more good stuff\n", 16);

Can you see the problem with this attempt? Hint: Race condition

With this naive writing to the log file, it is possible that the current time is midnight and the log file's location is being changed (meaning that subsequent writes will write into oldlog instead of log). We need to add some sort of check to make sure that the log is being written to the correct place. Here's the change:

    checklog();
    write(fd, "good stuff\n", 11);
    checklog();
    write(fd, "more good stuff\n", 16);

with checklog() defined as follows:

                        
    int checklog(void) {
        if(stat("log", &st) < 0 || st.st_size == 0) {
            close(fd);
            fd = open("log",...);
        }
    }

Right now Apache has to execute two system calls every time it wants to write to the log. This is a very inelegant way to solve the problem. This polling approach is also very slow.

Polling is when a process actively "checks" whether the resource that it wants to use is busy or not. The polling in our code are the checklog() calls. Polling in this manner effectively wastes CPU cycles checking if it's okay to execute code instead of actually executing it. This is what makes polling an inefficient and undesirable behavior for our program.

More trouble: What would happen if the power failed? We would have the same issue.

Signals

Let's start this section with an analogy.

signals : processes :: traps : hardware.

That's right, signals work for processes in the same way that traps work for hardware.

traps - after any machine instruction, the equivalent of INT 0x80 can occur.

signals - the kernel can take control and:

terminate processes
continue as before
cause program to call one of your functions with an asynchronous function call

Using signals is a good way to avoid wasting CPU resources by polling.

Signal	Meaning	Additional Info
SIGINT	interrupt ^C	<= uncooperative process
SIGHUP	hang up	<= unexpected loss of resources
SIGSEGV	segmentation violations
SIGBUS	bus error
SIGFPE	floating point exception	almost impossible to see nowadays
SIGPIPE	writing to pipe w/ no readers
SIGKILL	immediately ends process
SIGALRM	alarm clock	put a time limit on your own process
SIGXCPU	CPU quota exceeded limit
SIGXFS2	file size

Brief aside on why it is very rare (borderlined impossible) to see a SIGFPE:

programmers were embarrassed and frustrated by receiving SIGFPE
told hardware designers to make systems where floating point exceptions would not occur
hardware designers added handling such as INF-INF=NaN instead of triggering SIGFPE
still one way to get SIGFPE:

    int x = INT_MIN / (-1);

Some more information about kill:

Kill is very useful. If a process is behaving poorly, it can be killed in order to free the CPU to run meaningful processes. An example of how kill might be used is seen below.

                    
    int kill(pid_t pid, int sig); // send signal to process
        // can send only to own processes on lnxsrv

    pid_t p = fork();
    if(p > 0) {
        sleep(30);
        kill(p, SIGINT);
    }
    wait_pid(p, ...); // weird exit status > 127

Beyond the application of just killing processes, kill can be used more generally as a means of communication between processes by passing different signals through the second parameter (as opposed to SIGINT).

How do you handle signals?

With a signal handler of course!

A signal handler is a function that is declared to run whenever a certain signal is received. You can anticipate events that may cause signals to be created and then write a function to execute some code when that certain signal is received in order to gracefully recover from the signal causing behavior.

Typedef of a Signal Handler

                    
    typedef void (*sighandler_t)(int); // pointer to function that takes int & returns void

    sighandler_t signal(int, sighandler_t); // int is sig num, returns old handler

The signal handler takes an int and the new handler. Returns the old signal handler.

example:
                    
    int main(void) {
        signal(SIGALRM, bing); // set up handler before action that provokes signal
        alarm(30);
        // really complicated main code
        return 0;
    }

How is our initial implementation of the bing sighandler?

                    
    void bing(int sig) {
        printf("BING! %d\n", sig);
        exit(27);
    }

There are some major problems with this handler. What if the signal arrives when printf is active? This is very dangerous and will result in undefined behavior.

signal could arrive while printf is being called. printf is not safe when called 'recursively'
don't call exit. Exit flushes I/O buffers... could drop core.

Solution? Don't call printf or exit! Stick to safe things... AKA asynchronous-safe functions.

Asynchronous-safe functions are the only functions that should be called from within a signal handler. These functions are able to be interrupted at any time and can also run out of sync without causing undefined and dangerous side-effects.

A few asynchronus-safe functions:

write
read
close
_exit

All of the above functions must go through the kernel.

Don't use:

stdio (printf, fopen, fclose)
malloc, free
any function that looks at shared memory

Safe version of our handler:

    void bing(int sig) {
        write(1, "BING!\n", 6);
        _exit(27);
    }

A couple of more signals:

SIG_DFL
SIG_IGN <= discard signals

Example of Signal Handling with gzip

Let's take a look at a dangerous situation that can be caused by sending a signal when the CPU is executing our sensitive gzip command.

gzip foo => foo.gz

                    
    gzip(...) {
        int in = open("foo", O_RDONLY);
        int out = open("foo.gz", OWRONLY);
        magiczip(in, out);
        close(in);
        close(out);
        unlink("foo");
    }

What happens when you press ^C while gzip is executing? The default behavior would be return an incomplete foo.gz (bad).

If the process gets interrupted, we want for either orginal foo to be restored or the completely zipped foo.gz to be returned. What we don't want is some interleaved output or to lose all of our data in the case of a signal.

One fix would be to just ignore SIGINT signals. This way, the user can't possibly cause an interrupt when we are performing a sensitve operation.

                    
    gzip(...) {
        signal(SIGINT, SIG_IGN); // but then we can't use ^C
        int in = open("foo", O_RDONLY);
        int out = open("foo.gz", O_WRONLY);
        magiczip(in, out);
        close(in);
        close(out);
        unlink("foo");
    }

As referenced in the comment above, this implementation isn't quite right because it disables our usage of ^C entirely. If for some reason this instance of gzip was taking too long to execute or a bug somewhere else in the code caused an infinite loop, there would be no way to get back control of the CPU.

A better fix would be create a cleanup function and whenever a SIGINT signal is received, you run the clean up function.

                    
    gzip(...) {
        signal(SIGINT, cleanup);
        int in = open("foo", O_RDONLY);
        int out = open("foo.gz", O_WRONLY);
        magicaip(in, out);
        close(in);
        close(out);
        unlink("foo");
    }

    void cleanup(int sig) {
        unlink("foo.gz");
        _exit(97);
    }

However, this solution is also not perfect. What would happen if we exited during the unlink call? We could actually lose both foo and foo.gz. There has to be more secure way.

A suggested solution to this problem was to set an integer flag, but that would most likely be optimized out by the compiler.

An even better solution is to use a pthread_sigmask.

                    
    gzip(...) {
        signal(SIGINT, cleanup);
        int in = open("foo", O_RDONLY);
        int out = open("foo.gz", O_WRONLY);
        magiczip(in, out);
        close(in);
        close(out);
        pthread sigmask(SIG_BLOCK);
        unlink("foo");
    }

Using a pthread_sigmask blocks signals that are received during critical sections of code. When the critical code is done executing, it restores the mask to its previous value.

pthread_sigmask

                    
    int pthread_sigmask(int how,
                        sigset_t const * restrict set,
                        sigset_t * restrict oset);

Parameters:

how: handles altering of signal that it intercepts. i.e. how = {SIG_BLOCK or SIG_UNBLOCK or SIG_SETMASK}
set: gets the old signal and combines values to figure what the new set will be
oset: pointer to RO signal set

// while you're doing something important, you won't be bothered by a signal.

Note: The general rule of thumb for race conditions is to assume that you wrote it wrong. That is the case more often than not.

Threads

Threads are like light weight processes. Why do we use them? Performance!

we want a lot of them
we want fast context-switching
threads are not insulated from each other (threads utilize shared memory) which gives faster communication

Pros and Cons:

+ performance
- simplicity (can be very complicated and lead to race conditions)
- reliability

Usually you can get two out of the three to work. Generally programmers will sacrifice simplicity in order to get performance and reliability.

Threads			Processes
pthread_create			fork (or posix_spawn)
pthread_join			waitpid
pthread_exit			_exit

Note: Threads share memory while processes are isolated.