CS 111Lecture 6 Scribe NotesSignals, Signal Processing, and Threadsby Thomas Lutton & Nicholas Yee10/22/14What can go wrong?(cont'd discussion of the hazards of file descriptors from the end of lecture 5)File descriptors:Many race conditions can originate from how we utilize file descriptors. Below are a few examples of some error causing behaviour and the accompanying result.
There are many more errors that can be caused by improper usage of file descriptors. It is important to remember that file descriptors, much like memory, are a limited and tricky resource and certain procedures must be followed in order to write correct and safe code. Race Conditons:
a\n (bad: output discarded) b\n (bad: output discarded) a\nb\n (ok) b\na\n (ok) ab\n\n (bad: interleaved)It should be noted that "small writes" (i.e. <= 2048 bytes) are done atomically while writes with a large output have the potential to be interleaved. This would not happen in our simple example, but could become a possibility in other applications.
|a| + |b| = |infile|
This would create a lot of race conditions since the behavior largely depends on the timing which affects both of the above examples. The subshell on the left side of the pipe has a race condition because there is no guarantee which cat will output to the input end of the pipe. On the other hand, the cats in the subshell on the right side of the pipe fight each other to pull bytes from the output end of the pipe. Timing is innately variable and therefore very difficult to debug. The best way to debug a race condition is to not create one. In this lecture we will be examining ways of writing code in order to avoid race conditions and write safe code. Task: rotate a log fileOur goal is to keep a log of all of Apache's activity. In order to do this, we must keep two files, log and oldlog. log contains all of the information for the current day while oldlog contains a copy of the log file from the previous day.log <= apache writes to this endlessly oldlog <= yesterday's log The tricky part (most prone to error) occurs when log must be transitioned to oldlog at midnight. $ mv log old log $ >logHowever, we need some way of telling Apache to close/reopen its log file. i.e. close(fd); fd = open("log", O_WRONLY ...);Right now, our writes to the log look like this: write(fd, "good stuff\n", 11); write(fd, "more good stuff\n", 16);Can you see the problem with this attempt? Hint: Race condition With this naive writing to the log file, it is possible that the current time is midnight and the log file's location is being changed (meaning that subsequent writes will write into oldlog instead of log). We need to add some sort of check to make sure that the log is being written to the correct place. Here's the change: checklog(); write(fd, "good stuff\n", 11); checklog(); write(fd, "more good stuff\n", 16);with checklog() defined as follows:
Right now Apache has to execute two system calls every time it wants to write to the log. This is a very inelegant way to solve the problem. This polling approach is also very slow.Polling is when a process actively "checks" whether the resource that it wants to use is busy or not. The polling in our code are the checklog() calls. Polling in this manner effectively wastes CPU cycles checking if it's okay to execute code instead of actually executing it. This is what makes polling an inefficient and undesirable behavior for our program. More trouble: What would happen if the power failed? We would have the same issue. SignalsLet's start this section with an analogy.signals : processes :: traps : hardware. That's right, signals work for processes in the same way that traps work for hardware. traps - after any machine instruction, the equivalent of INT 0x80 can occur. signals - the kernel can take control and:
Brief aside on why it is very rare (borderlined impossible) to see a SIGFPE:
int x = INT_MIN / (-1); Some more information about kill:Kill is very useful. If a process is behaving poorly, it can be killed in order to free the CPU to run meaningful processes. An example of how kill might be used is seen below.
Beyond the application of just killing processes, kill can be used more generally as a means of communication between processes by passing different signals through the second parameter (as opposed to SIGINT).
How do you handle signals?With a signal handler of course!A signal handler is a function that is declared to run whenever a certain signal is received. You can anticipate events that may cause signals to be created and then write a function to execute some code when that certain signal is received in order to gracefully recover from the signal causing behavior. Typedef of a Signal Handler
The signal handler takes an int and the new handler. Returns the old signal handler.
example:
How is our initial implementation of the bing sighandler?
There are some major problems with this handler. What if the signal arrives when printf is active? This is very dangerous and will result in undefined behavior.
Asynchronous-safe functions are the only functions that should be called from within a signal handler. These functions are able to be interrupted at any time and can also run out of sync without causing undefined and dangerous side-effects. A few asynchronus-safe functions:
Don't use:
Safe version of our handler: void bing(int sig) { write(1, "BING!\n", 6); _exit(27); }A couple of more signals:
Example of Signal Handling with gzipLet's take a look at a dangerous situation that can be caused by sending a signal when the CPU is executing our sensitive gzip command.gzip foo => foo.gz
What happens when you press ^C while gzip is executing? The default behavior would be return an incomplete foo.gz (bad). If the process gets interrupted, we want for either orginal foo to be restored or the completely zipped foo.gz to be returned. What we don't want is some interleaved output or to lose all of our data in the case of a signal. One fix would be to just ignore SIGINT signals. This way, the user can't possibly cause an interrupt when we are performing a sensitve operation.
As referenced in the comment above, this implementation isn't quite right because it disables our usage of ^C entirely. If for some reason this instance of gzip was taking too long to execute or a bug somewhere else in the code caused an infinite loop, there would be no way to get back control of the CPU.A better fix would be create a cleanup function and whenever a SIGINT signal is received, you run the clean up function.
However, this solution is also not perfect. What would happen if we exited during the unlink call? We could actually lose both foo and foo.gz. There has to be more secure way.A suggested solution to this problem was to set an integer flag, but that would most likely be optimized out by the compiler. An even better solution is to use a pthread_sigmask.
Using a pthread_sigmask blocks signals that are received during critical sections of code. When the critical code is done executing, it restores the mask to its previous value. pthread_sigmask
Parameters:
Note: The general rule of thumb for race conditions is to assume that you wrote it wrong. That is the case more often than not. ThreadsThreads are like light weight processes. Why do we use them? Performance!
+ performance - simplicity (can be very complicated and lead to race conditions) - reliabilityUsually you can get two out of the three to work. Generally programmers will sacrifice simplicity in order to get performance and reliability.
Note: Threads share memory while processes are isolated. |