Lecture 6: Pipes and Signals

Pipes

Example pipeline of data:
Problem:
our main function calls things that should come before it.
int main() { while(read_check( input() ) != EOF) work(input); }
This is an example of a subroutine. However, we do not want subroutines, we want coroutines (things that can run together). Coroutines can call each other, but none are the main function/master of the other So how can we implement this? Multiple processes/programs, but how do we allow them to communicate?
A) Pipes - a bounded buffer between/among processes. This is done using a file descriptor that shared between two processes (although there is one for read and write, it uses the same portion of data). ex: A buffer between program A and program B
[ |==============| ] 8kb buffer | | B read fd A write fd
First In, First Out (FIFO)/queue, with max data in buffer The pipe will automatically loop around if it reaches the end of the buffer.
Potential problems:
Note: There are often complementary problems between the read end and the write end.
1) pipe is empty Will cause a HANG (read() waits for input by default) We should already be familiar with this, like when we wait for user input using get() 2) write, but pipe/buffer is full. Write() waits/hangs until it can start writing again. The system automatically adjusts speed of faster to match the other by causing hangs on the faster end This means that the max speed of the system is the slower end, which is perfectly fine. 3) two readers reading at the same time on a single pipe one wins, gets first data, 2nd gets next data 3b) same w/ 2 writers, alternate access to the pipe ex. (cat foo & cat bar) | sort The same as 3), it gets random order. 4) last reader exits the intuition is that writer hangs forever once buffer fills up however, in unix-like systems, default is writer gets SIGNAL to interrupt and exit (unless has handler) another possibility is write() fails (return -1, errno = ?? (ESPIPE?) ) problem w/ this approach is people do while() printf("hello"); printf is implemented by calling write, and nobody gets the return value of printf 5) last writer exits just that you've met the end of input (EOF reached) read() returns 0, normal procedure read/write at most can read/write the size of the pipe. doing too large does nothing (concat everything beyond say 8KiB) remember, read/write is atomic. ie: write(fd, buf, "aaaaa"); write(fd, buf, "bbbbb"); will write either "aaaaabbbbb" or "bbbbbaaaaa" HOWEVER, very large writes may interweave depending on the buffer size
Implementation:
du | sort -n
2 processes, fork() twice from shell Create a pipe:
int pipe(int[2]); //fd[0] is read, fd[1] is write
WRONG ORDER:
fork(); fork(); pipe(fds); // 3 pipes generated since all processes will recreate the pipe
Have 1st child make the pipe (between forks)
fork(); pipe(fds); fork();
Which allows the child to read from the other (since we need to pass du to sort). Remember, it's important to think about how the processes will fork. Examples:
1 2 3 4 sh sh sh sort \ / \ \ / \ du du sort sort sh du / / sort du
#4 is bad because you shouldn't start/end with a sort as a parent. sh is often called from login, and returning from sort is extremely irregular. #2 isn't good because du and sort aren't directly linked, though can still work #1 isn't good because du should exit before sort ends, making it hard to link between sh and sort #3 is the better tree since it follows the pipe structure up (sort takes input from du) reminder: don't turn parent into non-shell (other processes depend on it, like login) most important bug: you forget to close a pipe, causing reader to permanently hang (something that will never write/close has the left of pipe).

Signals

SIGNALS: a method to get a process's attention. One possibility: one pipe per process (file descriptor 3). To send a signal to a process, write('N') to its pipe (parent or any "delegate" can send signal. Whatever can have access to the file descriptor can send it) The big problem with this: requires modifying every program to have a handler/check the pipe.
if(read(3,...) == 1) call_signal_handler;
This requires far too much cooperation from the programmers. Most of us never would deal with these things. The solution should work even in the case of lazy coders. So a better solution is to instead change the abstract machine: Between and pair of instructions, a signal can be delivered. It is handled inside of SIGNAL HANDLER function simplest implementation:
void handler(int sig) { }
Unix-like systems use the signal function to assign this function. typedef void (*handler_t) (int); handler_t signal(int sig, handler_t handler); special handler_t values: SIG_IGN - ignore/do nothing SIG_DFL - default action -- dump core, ignore, exit example usage that gets user input:
char *tmpfile; void handle_int(int sig) { unlink(tmpfile); //cleanup exit(1); } int main(void) { tmpfile = gen_tmpfile(); signal(SIGINT, handle_int); //handle interupt while(c = getch()) executecommand(c); unline(tmpfile); }
potential problems: gen_tmpfile() creates file, problem if interrupt then so put signal BEFORE tmpfile also, check if(tmpfile) unlink(tmpfile) STILL problem. can get problem where tmpfile is assigned after interrupt The "orthoganol" solution (an uninterfering implementation):
int pthread_sigmask( int how, //SIG_BLOCK, SIG_UNBLOCK, SIG_SETMASK sigset_t const *restrict set, //set of signals, new settings/signals sigset *restrict oset ) //old settings/signals
Example Usage (from linux manual):
int main(int argc, char *argv[]) { pthread_t thread; sigset_t set; int s; /* Block SIGQUIT and SIGUSR1; other threads created by main() will inherit a copy of the signal mask. */ sigemptyset(&set); sigaddset(&set, SIGQUIT); sigaddset(&set, SIGUSR1); s = pthread_sigmask(SIG_BLOCK, &set, NULL); if (s != 0) handle_error_en(s, "pthread_sigmask"); s = pthread_create(&thread, NULL, &sig_thread, (void *) &set); if (s != 0) handle_error_en(s, "pthread_create"); /* Main thread carries on to create other threads and/or do other work */ pause(); /* Dummy pause so we can test program */ }
Any section of code that causes major malfunction if an interrupt is called during it is a critical section An example is a file assignment. Say we have the following:
f = open("some file",'r');
Even if we check if f is NULL in our signal handler, there is a chance that we might continue right when the assignment happens, which will cause more problems (opening the file twice or assuming it's closed when it is not). NOTE: by convention, all similar signals are blocked when inside of signal handler function. There is some heirarchy, ie SIG_PWR is pretty important, more than anything else usually & can interrupt most other signal handler functions. Example codes/types:
SIGINT ^C (ctrl C) SIGPWR power failure SIGHUP logout/hangup SIGPIPE pipe close SIGTERM kill signal; kill 239 SIGKILL kill -KILL 239 SIGSEGV segmentation SIGFPE floating point violation (integer division by 0) SIGBUS wrong memory access SIGALRM when something uses alarm(10), 10 seconds this is sent SIGXCPU used too much CPU quota, SOFT LIMIT SICXF52 creates file too big SIGSTOP halts process from running, but gets informed SIGTSTP ^Z (ctrl Z)
pros: +manage processes better +fix robustness issues cons: -processes aren't as isolated (OS is constantly affecting it) -signal handlers are notoriously buggy Example bug in our code: it will optimize the assignment of tmpfile & avoid assignment into memory (and instead just to a register) since the # of vars is low. There is a chance that the register can get overridden/used and cause problems. Signal hanlder will not properly handle it. Use the C keyword volatile to prevent optimization that can cause this error.