What is orthogonality? According to Wikipedia, orthogonality is “a system design property which guarantees that modifying the technical effect produced by a component of a system neither creates nor propagates side effects to other components of the system.”
In other words, we should decompose our problem into independent axes which don’t interfere with each other.
One of our goals in creating an operating system is to maintain orthogonality as best as we can.
In the newest version of the GNU Emacs file system, there is an inefficiency with how files are looked up (e.g. reading from "/usr/src/kernel/fs") using this function.
When we want to access the file, we call
DIR *d = opendir(“/usr/src/kernel/fs”);
char buf[1024];
strcpy(buf,”/usr/src/kernel/fs/”);
while(dp = readDir(d, … ))
{
// After readDir call, dp-> d_name = “README”;
strcat(buf, dp->d_name);
// buf now has our complete file name
struct stat st;
lstat(buf, &st);
// convert st to Lisp form
}
Why is this slow? The bottleneck lies in the lstat call. The time complexity is O(D*N), where D is the length of the directory name and N is the number of the files in directory.
How can we optimize this and get O(N) complexity? The solution:
Replace
lstat(buf, &st);
with:
int fd = dirfd(d); // Get the file descriptor of current directory fstatat(fd, dp->d_name, &st, O_SYMLINK); // Much faster!
This way we already start at the directory given by fd and don’t have to recursively determine the file location through lstat.
In this example, orthogonality between file names and file contents breaks down. Though they are supposed to be kept separate, in this case, file names become something you read out of a file. This causes many possible problems, including our inefficiency.
The Classic Unix Model has the following setup for orthogonality:
In the remainder of these notes, we will deal primarily with processes and how they are implemented and also touch on files.
In computing, a process is a program in execution in an isolated domain, meaning that it shouldn’t be able to mess with other processes. We can think of a process as a virtual computer with a subset of the total system. How do we specify how processes work and preserve orthogonality? We need
an API for dealing with processes; an interface that lets application programs deal with processes
associated intuition as “data structures”
pid_t fork();
int exec*(…);
void exit(int);
pid_t waitpid(pid_t, int*, int);
int kill(pid_t, int);
pid_t fork(); – spawns a process, pid_t is a handle for a process in sys/types.h: typedef long pid_t;
pid_t must be a signed integer type.
Fork clones the current process and
1. Basic use of fork()
pid_t c = fork(); if(c == 0) // Do what the child should do else // In parent
2. Parent keeps spawning child processes which immediately exit.
while(fork()) continue; exit();
3. Parent always spawns a child process and immediately exits. Creates a chain of child processes
while(!fork()) continue; exit();
4. Creates as many processes as you can if there is nothing to stop it
while(1) fork();
5. Subroutine to print the time of day using fork
bool printday(void) { pid_t p = fork(); if(p < 0) return false; if(p == 0) { execvp(“/usr/bin/date”, (char**) {“date”, “-u”, NULL}); exit(126); // large numbers to indicate problems with exec by convention } else { // back in parent int status; while(waitpid(p, &status, …) < 0) continue; // We’ve waited for the child process to finish return WIFEXITED(status) && (WEXITSTATUS(status) == 0); } }
How can we mess things up?
The following code creates a bunch of zombie processes.
// While we are able to fork while((p = fork()) >= 0) { if(!p) // If you are a child exit(); } // Over quota exit();
Since the parent exits, all zombie children are adopted by process 1 (init), the main process created by the system.
Somewhere in init.c:
// Reap the zombies while(waitpid(-1, …)) continue;
When given -1 as the pid, waitpid waits for ANY of its children to exit
Suppose date loops?
Problem:
// This would loop infinitely while(waitpid(p, &status, 0) < 0) continue;
One possible solution:
void donothing {} ... // Give it 5 seconds to resolve, // otherwise continue into the code signal(SIGALRM, donothing); alarm(5); // sets an alarm for the parent process, wakes // up parent process and delivers a signal if(waitpid(p, &status, 0) < 0 && errno == EINTR) { // alarm went off kill(p, SIGINT); // sends a hint to child process to finish up waitpid(p, &status, 0); }
If you’re in the middle of a system call and gets interrupted, it will resolve that process and then send a special error number EINTR, which we can then handle and choose what to do with it
If date.c has some way to counter kill with its own signal handler
void donothing {} ... // In date.c signal(SIGINT, donothing);
You can change kill(p, SIGINT) to
kill(p, SIGKILL);
If you send SIGKILL, there is no way to ignore or escape the kill signal, process will die
How does this table work? When the kernel wants to resume some process, it uses a special machine instruction: reti (return from interrupt)
If we have ten processes on a single core machine, we have 10 filled rows in the process table. What if a process is currently running? Many of the entries in a currently running process are unknown because they are being updated as the process is running. The process table is an important part of how virtualization works.
int open(char const* filename, int flags, … )
int close(int fd);
int dup(int fd);
int dup2(int oldfd, int newfd);
For security and virtualization reasons, every process has their own set of file descriptors, and nobody else can see your file descriptor list.
Does fork clone file descriptors? Yes! This is an example of how our ideal situation of orthogonality does not work entirely, since file descriptors and processes interact in the process table.