CS 111 Lecture 5 Scribe Notes

By Eric Shih and Mark Ketenjian

Definition of Orthogonality

What is orthogonality? According to Wikipedia, orthogonality is “a system design property which guarantees that modifying the technical effect produced by a component of a system neither creates nor propagates side effects to other components of the system.”

In other words, we should decompose our problem into independent axes which don’t interfere with each other.

One of our goals in creating an operating system is to maintain orthogonality as best as we can.

Problems with Orthogonality

GNU Emacs directory–files-and-attributes function:

In the newest version of the GNU Emacs file system, there is an inefficiency with how files are looked up (e.g. reading from "/usr/src/kernel/fs") using this function.

When we want to access the file, we call

DIR *d = opendir(“/usr/src/kernel/fs”);
char buf[1024];
strcpy(buf,”/usr/src/kernel/fs/”);
while(dp = readDir(d, … )) 
{
        // After readDir call, dp-> d_name = “README”;
        strcat(buf, dp->d_name);
        // buf now has our complete file name
        struct stat st;
        lstat(buf, &st);
        // convert st to Lisp form
}

Why is this slow? The bottleneck lies in the lstat call. The time complexity is O(D*N), where D is the length of the directory name and N is the number of the files in directory.

How can we optimize this and get O(N) complexity? The solution:

Replace

lstat(buf, &st);

with:

int fd = dirfd(d); // Get the file descriptor of current directory
fstatat(fd, dp->d_name, &st, O_SYMLINK); // Much faster!

This way we already start at the directory given by fd and don’t have to recursively determine the file location through lstat.

In this example, orthogonality between file names and file contents breaks down. Though they are supposed to be kept separate, in this case, file names become something you read out of a file. This causes many possible problems, including our inefficiency.

Tools for Orthogonality

The Classic Unix Model has the following setup for orthogonality:

X = processes
Y = filename how to parse file names
Z = devices – able to read/write – send bytes through a network
W = file systems – Way of organizing and accessing our files

In the remainder of these notes, we will deal primarily with processes and how they are implemented and also touch on files.

Orthogonality: Processes

In computing, a process is a program in execution in an isolated domain, meaning that it shouldn’t be able to mess with other processes. We can think of a process as a virtual computer with a subset of the total system. How do we specify how processes work and preserve orthogonality? We need

an API for dealing with processes; an interface that lets application programs deal with processes
associated intuition as “data structures”

The API for Processes:

pid_t fork();
- Clones a process, copies everything including all of the stack
int exec*(…);
- Reuses a process, always returns -1 for failure, won’t return if success in “brain transplant”
void exit(int);
- Destroys the process
pid_t waitpid(pid_t, int*, int);
- Waits for process pid to finish
int kill(pid_t, int);
- Sends in PID and signal number, exits the process with the designated PID, you can kill anything that is yours (so you can’t kill process 1)

Fork

pid_t fork(); – spawns a process, pid_t is a handle for a process in sys/types.h: typedef long pid_t;

pid_t must be a signed integer type.

Fork clones the current process and

Yields the pid of the child if you’re in the parent process
Yields 0 if you’re in the child process
Yields -1 if there isn’t enough resources

Examples using fork:

1. Basic use of fork()

pid_t c  = fork();
if(c == 0)
    // Do what the child should do
else
    // In parent

2. Parent keeps spawning child processes which immediately exit.

while(fork())
  continue;
exit();

3. Parent always spawns a child process and immediately exits. Creates a chain of child processes

while(!fork())
  continue;
exit();

4. Creates as many processes as you can if there is nothing to stop it

while(1)
    fork();

5. Subroutine to print the time of day using fork

bool printday(void)
{
  pid_t p = fork();
  if(p < 0)
      return false;
  if(p == 0)
  {
      execvp(“/usr/bin/date”, (char**) {“date”, “-u”, NULL});
      exit(126); // large numbers to indicate problems with exec by convention
  }
  else
  {
      // back in parent
      int status;
      while(waitpid(p, &status, …)  < 0)
        continue;    
      // We’ve waited for the child process to finish
      return WIFEXITED(status) && (WEXITSTATUS(status) == 0);
  }
}

How can we mess things up?

We could create random pids: waitpid(97, &st, 0);
- What if the process doesn’t exist? Returns -1 and errno to “no such process”
Deadlock: If A waits for B and B waits for A, neither would ever finish.
- Solution in unix: you are only allowed to wait for your children
Security: Problem with snooping on arbitrary processes since we can specify whatever pid we want
Child exits before parent waits for it
- This is fixed by zombie processes:
  - Processes that have exited but are still around in the system
- Records exit status so the parent can still access the result
- Frees up most of the resources
- Zombies will stay in the system until the kernel determines that nobody will ever access it again
  Example of zombie processes:
  
  The following code creates a bunch of zombie processes.
```
// While we are able to fork
while((p = fork()) >= 0)
{
    if(!p) // If you are a child
        exit();
}
// Over quota
exit(); 
```
  Since the parent exits, all zombie children are adopted by process 1 (init), the main process created by the system.
  
  Somewhere in init.c:
```
// Reap the zombies 
while(waitpid(-1, …)) 
    continue;
```
  When given -1 as the pid, waitpid waits for ANY of its children to exit

Suppose date loops?

Problem:

// This would loop infinitely
while(waitpid(p, &status, 0) < 0) 
    continue;

One possible solution:

void donothing {}
...
// Give it 5 seconds to resolve, 
// otherwise continue into the code
signal(SIGALRM, donothing);
alarm(5); // sets an alarm for the parent process, wakes
          // up parent process and delivers a signal
if(waitpid(p, &status, 0) < 0 && errno == EINTR) 
{
    // alarm went off
    kill(p, SIGINT); 
    // sends a hint to child process to finish up
    waitpid(p, &status, 0);
}

If you’re in the middle of a system call and gets interrupted, it will resolve that process and then send a special error number EINTR, which we can then handle and choose what to do with it

If date.c has some way to counter kill with its own signal handler
```
void donothing {}
...
// In date.c
signal(SIGINT, donothing);
```
You can change kill(p, SIGINT) to
```
kill(p, SIGKILL);
```
If you send SIGKILL, there is no way to ignore or escape the kill signal, process will die

Data Structure for Processes: Process Table

How does this table work? When the kernel wants to resume some process, it uses a special machine instruction: reti (return from interrupt)

If we have ten processes on a single core machine, we have 10 filled rows in the process table. What if a process is currently running? Many of the entries in a currently running process are unknown because they are being updated as the process is running. The process table is an important part of how virtualization works.

Orthogonality: Files

The API for Files

int open(char const* filename, int flags, … )
- Returns file descriptor: a nonnegative integer
- The open function is a handle that creates a file descriptor
int close(int fd);
- You can close whatever file you want
- Returns -1 if the file doesn’t exist
- The process and file analogy - fork:exit :: open:close
int dup(int fd);
- Create another file descriptor that points at the same place
int dup2(int oldfd, int newfd);
- Copies oldfd into newfd

For security and virtualization reasons, every process has their own set of file descriptors, and nobody else can see your file descriptor list.

Does fork clone file descriptors? Yes! This is an example of how our ideal situation of orthogonality does not work entirely, since file descriptors and processes interact in the process table.