CS 111Scribe Notes for 1/23/13by Alexander Chiou, Derek NgOrthogonalityIn Emacs version 24.2, there is code that reads in a file:
At the moment, it does so slowly, because:
The cost of this is O(DN), where D is the length of the filename and N is the number of files in the directory. We can speed it up by taking the strcpy call out of the loop and reusing the char buffer.
And so the code continues, but we still have to traverse the length of the string that holds the file's full name.
We want O(N) cost, so we use another function, fstatat, instead of lstat:
fstatat allows us to get file status relative to a directory's file descriptor. Using it, we no longer have to traverse over the full length of a directory's name. As the above graph shows, we want to be able to choose z freely without affecting x or y. Were the axes not orthogonal, moving along one axis would affect its coordinates along the other axes. In other words, we should choose axes that are orthogonal. In the previous example, the x-axis would be the file name while the y-axis would be the file contents. Since the length of the file name affects how easily the system is able to get information about the file's contents, this is not an orthogonal setup. In the end, everything ends up colliding. It's our job to figure out how to minimize these collisions. The Classic Unix ModelThe classic Unix model for orthogonality attempts to make the following factors orthogonal: processes, file names, device access, and file systems. ProcessesA process is a program that's executing in an isolated domain. To specify how processes work, we build an API for dealing with processes using our associated intuition visualizing them as data structures. ForkingForking is, put simply, cloning a process. The current process is called the parent process, and the new process is called the child process.
The function
In the example above, the return value of
This code will attempt to make lots of children that immediately all exit. The parent will never have 0 as its return value, while the children fail out of the while loop and proceed to exit.
This code, on the other hand, makes an exponential amount of children. The parent will fork, and then the parent and its child will fork, and the parent and its child and the two new grandchildren will fork, and so on and so forth. Running this code is ill-advised. Suppose we want to print the date in an incredibly roundabout manner, forking the process and then having the child print the date. This is what the function would look like:
The waitpid
waitpid is a function that waits on a process until it exits. It returns -1 if the first argument doesn't exist. In the printdate example above, the following code can be used instead:
However, waitpid can cause deadlock. For example, if process a waits on process b and process b also waits on process a, this causes perpetual waiting. Unix convention circumvents this by allowing processes to only wait on their children (but not other descendents, like grandchildren). Suppose a child is exited before waitpid is called on the child. This would be problematic. To solve this, we have zombie processes, processes that have exited but is still kept alive in the system. This means that any wait calls on finished processes won't crash. If we want to produce lots of zombies, we can do so with the following code:
We fork the current process, then the parent and child enter the loop. The parent fails out of the if statement and produces more children, and the child passes the if statement and immediately exits. These exited child processes are zombies. Here is some example code that "reaps the zombies", forcing any processes in a zombified state to exit.
A Problem With Our Date Implementation
To fix this, we implement a 5 second time limit for processes, using a system call named alarm. Here is some example code which uses alarm:
From line 12 of the above code, we can see that we have a new API member, kill. Here is its function declaration
The Process TableThe process table is an object that the kernel looks at to make decisions concerning processes. It contains information about each of the processes that are currently running. Below is a rough example process table:
However, not all of the information in a process table may be accurate, especially for running processes. For example, let's say that Process 3 is running out of 10 possible process on a single core machine. Process 3 is "special", because it can have a lot of incorrect values. One of the most common values to be incorrect is the instruction pointer. In order for it to be right, the instruction pointer would have to update each time it ran an instruction and that is simply too much of a hassle to maintain. Because of dilemmas similar to the one above, many values in the process table are actually static when they have to be dynamic. FilesNow let's move on from processes to file. Similar to processes, let's construct an API that deals with files. Here is the function declaration for open, a function that opens a file:
The above function returns a file descriptor object, but from the point of view of the user's application, the return value function is simply another non-negative integer. Now let's implement the opposite of open, which is close. Here's the function declaration for this function:
This function takes in file descriptor fd, which points to the file that we want to close. As a side note, passing in a raw integer to this function crashes it. Here is an analogy between processes and files to illustrate the roles of open and close.
Lastly, we have dup, which is short for duplicate. This function creates a new file descriptor.
dup "clones" files similar to fork() for processes, but it isn't entirely similar to fork(), because both the old and new file descriptors refer to the same file, which is aliasing. Process Table AlterationTo accomodate the files of processes, we add a files section to each record in the process table, containing file descriptors to each of the files each process is working with. Each file descriptor points to another table, which contains information about the file itself. All of this is illustrated in the following diagram: Final Question of the DayQ: Does fork clone file descriptors? A: To be answered in the next lecture |