When designing an OS, we try to achieve the following goals:
In order to do that we apply the classics techniques of modularity, so each piece of the operating system is simple and more flexible. When we design for modularity, we don't just pick random cuts. We pick the natural fault lines, because we don't just want modularity. We also desire abstraction.
We organize our system so that the design becomes layered. For instance in Figure 1, we can imagine the OS as having the kernel layer between applications and the hardware. Accessing the hardware requires privleged instructions, and accessing the kernel from the application's perspective requires system calls.
Figure 1 | Figure 2 |
---|
In Figure 2, we can imagine the OS having even more layers. This would allow even greater abstraction. In fact, x86 hardware allows supports up to four levels. The kernel mode bit is actually made up of two bits (allowing 4 possible choices). So you could theoretically design something with the structure:
However this is not the case for linux, which only uses two levels. The reason for this is simplicity and speed. It would slow down the operating system if you continually had to ask for permission for relatively straight forward things (like accessing I/O devices). Layers will always slow you down somewhat, so there is a dispute over how many layers there should be.
Now we want to talk about abstraction that looks like the diagram below. Applications sit above the kernel, and talk to each other through pipelines. However they are really just talking to the kernel (and the kernel is transferring data between them). The pipelines are abstracting what is actually happening. To understand how all this works, you need to be able to juggle several layers of abstraction in your head at the same time.
Fork clones a process, identical to the original with some exceptions. Here are some of them...
pid_t get pid(void);
pid_t get ppid(void);
$time make
in the child, you'll see the CPU time reset to zeroFigure 3 |
---|
You can think of execvp as sort of the inverse of fork:
So say we want to print the date to the screen, but if we just run /bin/date
then our program would be destroyed.
Here is our prototype code with strikethrough on parts where we discovered things could be changed for the better.
boolvoidprintdate(void) { pid_t p = fork(); switch (p) { case -1: // Error case error(); // Assume error() exits the program and never returns case 0: { // Child case static char const date[] = "/bin/date";if(execvp(date, (char const*) {date, NULL} < 0) error(); // If program ever gets here, must be an error } default: { // Parent case int status; if (waitpid(p, &status, 0) != p) error(); } return (WIFEXITED(status) && WEXITSTATUS(status) == 0); // WIFEXITED non-zero if waitpid reports process exited normally } }
What if we want to send the date to a file instead?
bool printdate(char const *outfile) { // After forking... ... int fd = open(outfile, O_WRONLY); if (fd < 0) error(); if (dup2(fd, 1) < 0) error();close(fd);// What if fd is 1? Would be bad. if (fd != 1) close(fd); ... execvp(...) }
All that extra bold stuff we put in is like the house keeping/glue for getting our child process up and running. So what if we wanted to avoid doing all that each time? Let's look at a system call designed for the task.
int posix_spawnvp(pid_t *restrict pid, char const * restrict file, posix_spawn_filek_actions_t const * restrict file_acts, posix_spawn_attr_t const * restrict attrp, char * const * restrict argv, char * const * restrict envp)
This is not as favored as fork due to the high complexity of it, but it has the advantage of being cheap. For example fork might have to copy 8 GB of memory to fork a giant process, when the programmer only wants the child to run one line of code and then exit. Seems like a waste, right? This is one of the reasons why Windows uses POSIX instead of fork.
In mathematics, orthogonality means you can pick a point on an axis of a coordinate system (x0), and your other choices of y0 and z0 will not be constrained by x0. Each choice is independent of the other. We want to apply this to operating system design by making system calls orthogonal to each other. Some reasons for this include...
Let's talk about orthogonality with files. You can think of files (if you are the CPU) as these slow and unreliable things (bad disk, network down, etc...). The issues we worry about for files are robustness and ability to perform well (as well as orthogonality).
Some types of files:
Stream Types | Request/Response Types | |
---|---|---|
Network | Flash | |
Mouse | vs. | Disk |
Keyboard |
Some general properties about Stream type devices:
And some general properties about Request/Response devices:
With this in mind we get to the UNIX BIG IDEA:
Everything is a file.
For example, the terminal can be directly read from:
open("/dev/tty", O_RDONLY);
We can open the boot disk and change how the computer boots:
open("/dev/rdsk/b", O_WRONLY);
In general, we want a small set of operations that can apply to a large set of files:
Now what happens when we try to apply lseek to a stream device?
lseek(fd, 27000, SEEK_SET) // fd = tty file from above
This will return -1 and set errno to something like: "file doesn't support lseek".
With this example in mind, some primitives don't work on all files, but we still want a small set of primitives.
Processes communicate with files or IPC (interprocess communication). This includes the use of pipes (or bounded buffers). Since pipes are bounded, they never run out of memory. The kernel will instead tell the process to wait if buffer is full. Some data needs to be read first (can think of this as too much data in transit). Let's look at a pipe with 4 KiB of buffer. Imagine Process A writes("xyz") and then writes a bunch more stuff. Process B wants to read the buffer, but has to wait for Process A to finish.
If the pointer w ever equals the pointer r, then Process A should not write any more data to the buffer. Otherwise data in transit will be overwritten before it can be read by Process B. Now what if we expanded the example to include multiple readers and multiple writers?
Some things that could happen...
Why this message fragmentation and interleaving occurs leads us to the next topic of...
Define a race condition as a situation where behavior changes depending on when the process gets scheduled. So let's look at an example where we try to write sort's output to a file that won't get wiped out by another process also running sort at the same time.
sortc: int create_temp_file(void) { // Need O_EXCL so this fails if file is already created open("/tmp/sorttmp"name, (O_RDWR|O_CREAT|O_TRUNC|O_EXCL
), 0600) // Using "/tmp/sorttmp" for file name is not unique enough // Another process could easily overwrite it // So let's try using the process ID in the name char name[1000]; sprintf(name, "/tmp/sort.%d",getpid()random()); // random() would be better }
This is the general idea we have to pursue to avoid a race condition:
while (open file not succeed) // Keep trying to open file (code above) // Maybe we don't need the O_EXCL option? This could do the same thing. for (;;) { sprintf(name, some_random_string()); // Check if file already exists by trying to open for reading if (open(name, O_RDONLY) < 0); return open(name, O_WRONLY, 0600); }
This looks like it will work, but there's still a race condition bug. Another process could sneak in between the system calls to open.
if (open(name, O_RDONLY) < 0);
// Another process sneaks in here and opens the file we just checked for
return open(name, O_WRONLY, 0600);
This gap between system calls means that another process can do something which causes our file-exist-check to no longer be valid. This is a classic race condition problem which happens all the time. Because this approach suffers from a race condition bug, do we still want orthogonality and simple system calls? Would it be safer to always use big system calls with a lot of options like posix_spawnvp?
What's different about open that makes it the wrong thing to do for this scenario? Back when we were comparing posix_spawnvp and fork, there was nothing to race against. Both tools had the same correctness, but one was more orthogonal. Now when we compare open/sort and some other larger system call, we do have an issue of correctness. The race condition makes us pick between orthogonality and correctness.
So as you can see, orthogonality is not a complete win all the time. It's something you have to trade off, considering all the possible bugs and other issues.