Property | What the user might think happens | What actually happens |
---|---|---|
ALU | It does basic arithmetic at a very low level like adds, subtracts, bit-shifting, etc. | Same as what user thinks |
Registers | Registers keep track of a small amount of data for very fast access | Simulates registers by saving the ones not needed in memory |
Memory | All data goes to the RAM and comes back as efficiently as possible | Commonly accessed data is kept on the RAM and less commonly accessed data might instead be kept in physical memory. This is a tradeoff that might decrease efficiency, but it makes it easier to store data. This is at a high level how virtual memory works. |
Input Output | Input and output instructions are called through syscalls | Kernel handles everything and controls how all input and output occurs |
Linux/Unix uses ‘int’ as a handle for an open file. It requires file descriptors to know access the files. Other OSes uses a struct filedes* ‘pointer’ instead. The advantage of having an ‘int’ as a file descriptor is that there's a greater ability to move things around. The advantage of using a struct is that the struct provides more information, better performance, better control, and type checking. It is, however, less portable and violates orthogonality.
A file descriptor is a pointer to file description. File descriptions are a user-defined string that can be assigned to a file or folder. They can be passed between processes.
With the Linux solution, the process will suspend forever, because we cannot hang the writer since the pipe will never be emptied.
In this example “:” does nothing but exit with 0.
Solutions:if(printf (“Hello”) < 0)
There's no writer, so it will hang forever. The convention is for read to return 0, which stands for EOF.
So far we have been talking about ordinary pipes. There are also named pipes that live in the file system. Anyone can open the pipe and read or write to it.
Memory leaks can occur if the file descriptors are never used. Suppose we have a program that creates a pipe and doesn't use it:
int ok = pipe(fd) == 0 for(;;) { //something else happens }
What would happen? Nobody else can access the pipe. Only your file descriptor table mentions the pipe. So, essentially you've created a buffer in the kernel, but the kernel doesn't know you're not ever going to write to it. You've wasted system resources (this is akin to a memory leak). This is called a file descriptor leak.
You can create a pipe and mistakenly set it up so that there is a writer who will never write anything to the pipe.
A parent process needs to close its own write end when the child process closes, otherwise it will go into an infinite loop. Same with reads.
Parent process talks to and from a sort function with two pipes. Set up with fork(), pipe(), pipe(), dup2() a ton, execvp() for sorting. Parent process sends copy of data to sort. Sort then ships the sorted copy back to the parent. This works because sort doesn't write any data until it has read everything it needs to.
For a program like sed, reads and writes can both happen, and cause both processes to hang. You'd have two full pipes, waiting for each other to make space for each to write, leading to a deadlock. This is a race condition. If sed is writing more than its reading, then the pipe going to the parent will fill up, and then the pipe going to the child will fill up and cause both processes to freeze.
File descriptors access files at a lower level than file names. Can grep a file after it has been removed. Example: pipes are nameless files. A file won't be removed until all file descriptors pointing at it go away, and its name is removed from the file system.
The idea in question is: You've got a big file. You remove it, but then grep something. The grep command will still find even though bigfile was removed.
Why? File descriptors access files at a lower level than file names. You could access a file without a name. (ex. Pipes do this! You have nameless files from pipe). By end of ‘;’ the data is then removed.
A file won't be removed until all the file descriptors pointing at it have been deleted. All the readers must leave. Treat files a bit like pipes.
It is worthy to note that there are operating systems in which signals have been discarded. Why? Because signals can be troublesome.
Asynchronous Events in I/O
Error in Your Code
Impatient User or Infinite Loop
Impending Power Outages
Children Management
p = waitpid( -1, &status, WNOHANG );
User Went Away
Shell Bombs
Alarms
To suspend a process (where 29 is the PID):
To resume a process (where 29 is the PID):
Note that some Linux systems have a SIGEOF signal. This event (an EOF) isn't unusual enough for Linux, so we don't see it here. We expect that you should ordinarily be able to deal with a finished file. However, determining an "unusual" event is really just a judgement call.
We have a function for this!
sighandler_t signal( int, sighandler_t )
This function returns a pointer to the old handler. And takes in an integer of the signal number, as well as a pointer to a function that will handle that signal.
sighandler_t
is a void(*sighandler_t)(int)
AKA: This is a function that takes an int (the signal number) and returns void. This is a sort of callback mechanism.
Signal handlers can be called at anytime in the process once initiated, whenever the system feels like it.
For example, somewhere in our code we want to handle signal 29 with function f:
signal(29, f); x = y + 1; x = x + 1;
It's possible that right before the second piece of arithmetic, the machine will call f(29).
Basically, in Eggert's words, we see something happening at the machine level. If you have a bunch of instructions, between any pair of instructions there is some extra secret (up until now) code that essentially says, "If I feel like it, let's call f(29)."
When we consider this, then a signal handler can be invoked between any pair of instructions.
Back to our example above: if our handler f modifies the variable x, then x will be modified in the middle of an expression!
Signal handling is a great way to introduce race conditions into your code!
Want gzip to act an as atomic operation (all or nothing).
Here's the command:
Here's some standard code for gzip:
gzip.c fd = open("foo", O_RDONLY); fo = open("foo.gz", O_WRONLY); while(compress(fd, fo)) continue; close(fo); unlink("foo");
This is pretty standard. But this isn't atomic.
If you were to ^C in the middle of this process, you'd end up with a partially created foo.gz file that wasn't properly compressed.
How do we fix this? We need to account for interrupt signals and do some cleanup.
Our first guess might look something like this:
gzip.c fd = open("foo", O_RDONLY); signal(SIGINT, cleanup); fo = open("foo.gz", O_WRONLY); while(compress(fd, fo)) continue; close(fo); unlink("foo"); static void cleanup(int sig) { unlink("foo.gz"); _exit(1); }
This would indicated a failure, and cleanup is done. But this is incomplete. Why? We have a race condition.
If you ^C right as you call the signal, you might be removing a foo.gz that doesn't exist!
What about when the signal comes after you've successfully made and compressed foo.gz? We don't want to delete the copy of the data!
We have a proposed solution (right as Eggert was wrapping up):
gzip.c static void cleanup(int sig) { unlink("foo.gz"); _exit(1); } main() { fd = open("foo", O_RDONLY); signal(SIGINT, cleanup); // CTRL-C RACE HERE fo = open("foo.gz, O_WRONLY"); while (compress(fd, fo)) continue; close(fo); signal(SIGINT, SIG_DEL); unlink("foo"); // RACE HERE fixed? }
Note: This scenario will be continued in Wednesday's lecture.