Problem:
our main function calls things that should come before it.
int main() {
while(read_check( input() ) != EOF)
work(input);
}
This is an example of a subroutine. However, we do not want subroutines, we want coroutines (things that can run together).
Coroutines can call each other, but none are the main function/master of the other
So how can we implement this? Multiple processes/programs, but how do we allow them to communicate?
A)
Pipes - a bounded buffer between/among processes.
This is done using a file descriptor that shared between two processes (although there is one for read and write, it uses the same portion of data).
ex: A buffer between program A and program B
[ |==============| ] 8kb buffer
| |
B read fd A write fd
First In, First Out (FIFO)/queue, with max data in buffer
The pipe will automatically loop around if it reaches the end of the buffer.
Potential problems:
Note: There are often complementary problems between the read end and the write end.
1) pipe is empty
Will cause a HANG (read() waits for input by default)
We should already be familiar with this, like when we wait for user input using get()
2) write, but pipe/buffer is full.
Write() waits/hangs until it can start writing again.
The system automatically adjusts speed of faster to match the other by causing hangs on the faster end
This means that the max speed of the system is the slower end, which is perfectly fine.
3) two readers reading at the same time on a single pipe
one wins, gets first data, 2nd gets next data
3b) same w/ 2 writers, alternate access to the pipe
ex. (cat foo & cat bar) | sort
The same as 3), it gets random order.
4) last reader exits
the intuition is that writer hangs forever once buffer fills up
however, in unix-like systems, default is writer gets SIGNAL to interrupt and exit (unless has handler)
another possibility is write() fails (return -1, errno = ?? (ESPIPE?) )
problem w/ this approach is people do while() printf("hello");
printf is implemented by calling write, and nobody gets the return value of printf
5) last writer exits
just that you've met the end of input (EOF reached)
read() returns 0, normal procedure
read/write at most can read/write the size of the pipe. doing too large does nothing (concat everything beyond say 8KiB)
remember, read/write is atomic. ie:
write(fd, buf, "aaaaa"); write(fd, buf, "bbbbb"); will write either "aaaaabbbbb" or "bbbbbaaaaa"
HOWEVER, very large writes may interweave depending on the buffer size
Implementation:
du | sort -n
2 processes, fork() twice from shell
Create a pipe:
int pipe(int[2]); //fd[0] is read, fd[1] is write
WRONG ORDER:
fork();
fork();
pipe(fds); // 3 pipes generated since all processes will recreate the pipe
Have 1st child make the pipe (between forks)
fork();
pipe(fds);
fork();
Which allows the child to read from the other (since we need to pass du to sort).
Remember, it's important to think about how the processes will fork. Examples:
1 2 3 4
sh sh sh sort
\ / \ \ / \
du du sort sort sh du
/ /
sort du
#4 is bad because you shouldn't start/end with a sort as a parent. sh is often called from login, and returning from sort is extremely irregular.
#2 isn't good because du and sort aren't directly linked, though can still work
#1 isn't good because du should exit before sort ends, making it hard to link between sh and sort
#3 is the better tree since it follows the pipe structure up (sort takes input from du)
reminder: don't turn parent into non-shell (other processes depend on it, like login)
most important bug: you forget to close a pipe, causing reader to permanently hang (something that
will never write/close has the left of pipe).
SIGNALS: a method to get a process's attention.
One possibility: one pipe per process (file descriptor 3). To send a signal to a process, write('N') to its pipe
(parent or any "delegate" can send signal. Whatever can have access to the file descriptor can send it)
The big problem with this: requires modifying every program to have a handler/check the pipe.
if(read(3,...) == 1) call_signal_handler;
This requires far too much cooperation from the programmers. Most of us never would deal with these things. The solution should work even in the case of lazy coders.
So a better solution is to instead change the abstract machine:
Between and pair of instructions, a signal can be delivered. It is handled inside of SIGNAL HANDLER function
simplest implementation:
void handler(int sig) { }
Unix-like systems use the signal function to assign this function.
typedef void (*handler_t) (int);
handler_t signal(int sig, handler_t handler);
special handler_t values:
SIG_IGN - ignore/do nothing
SIG_DFL - default action
-- dump core, ignore, exit
example usage that gets user input:
char *tmpfile;
void handle_int(int sig) {
unlink(tmpfile); //cleanup
exit(1);
}
int main(void) {
tmpfile = gen_tmpfile();
signal(SIGINT, handle_int); //handle interupt
while(c = getch())
executecommand(c);
unline(tmpfile);
}
potential problems:
gen_tmpfile() creates file, problem if interrupt then
so put signal BEFORE tmpfile
also, check if(tmpfile) unlink(tmpfile)
STILL problem. can get problem where tmpfile is assigned after interrupt
The "orthoganol" solution (an uninterfering implementation):
int pthread_sigmask( int how, //SIG_BLOCK, SIG_UNBLOCK, SIG_SETMASK
sigset_t const *restrict set, //set of signals, new settings/signals
sigset *restrict oset ) //old settings/signals
Example Usage (from linux manual):
int main(int argc, char *argv[])
{
pthread_t thread;
sigset_t set;
int s;
/* Block SIGQUIT and SIGUSR1; other threads created by main()
will inherit a copy of the signal mask. */
sigemptyset(&set);
sigaddset(&set, SIGQUIT);
sigaddset(&set, SIGUSR1);
s = pthread_sigmask(SIG_BLOCK, &set, NULL);
if (s != 0)
handle_error_en(s, "pthread_sigmask");
s = pthread_create(&thread, NULL, &sig_thread, (void *) &set);
if (s != 0)
handle_error_en(s, "pthread_create");
/* Main thread carries on to create other threads and/or do
other work */
pause(); /* Dummy pause so we can test program */
}
Any section of code that causes major malfunction if an interrupt is called during it is a critical section
An example is a file assignment. Say we have the following:
f = open("some file",'r');
Even if we check if f is NULL in our signal handler, there is a chance that we might continue right when the assignment happens, which will cause more problems (opening the file twice or assuming it's closed when it is not).
NOTE: by convention, all similar signals are blocked when inside of signal handler function. There is some heirarchy, ie SIG_PWR is pretty important, more than anything else usually & can interrupt most other signal handler functions.
Example codes/types:
SIGINT ^C (ctrl C)
SIGPWR power failure
SIGHUP logout/hangup
SIGPIPE pipe close
SIGTERM kill signal; kill 239
SIGKILL kill -KILL 239
SIGSEGV segmentation
SIGFPE floating point violation (integer division by 0)
SIGBUS wrong memory access
SIGALRM when something uses alarm(10), 10 seconds this is sent
SIGXCPU used too much CPU quota, SOFT LIMIT
SICXF52 creates file too big
SIGSTOP halts process from running, but gets informed
SIGTSTP ^Z (ctrl Z)
pros:
+manage processes better
+fix robustness issues
cons:
-processes aren't as isolated (OS is constantly affecting it)
-signal handlers are notoriously buggy
Example bug in our code: it will optimize the assignment of tmpfile & avoid assignment
into memory (and instead just to a register) since the # of vars is low. There is a chance that the register can get overridden/used and cause problems. Signal hanlder will not properly handle it. Use the C keyword
volatile to prevent optimization that can cause this error.