Lecture 5

Orthogonality is the way to manage complexity in big projects. In a purely orthogonal design, operations do not have side effects; each action (whether it's an API call, a macro invocation, or a language operation) changes just one thing without affecting others. There is one and only one way to change each property of whatever system you are controlling. For a good Operating System ,taking multiple components on different axes, then one axis is independent of each other. We try to ensure that design decision do not interfere with each other as that can lead to higher complexity.
For example, while designing an operating system, we should try to keep process decisions independent from file decisions. As per the figure below, we should be able to choose a configuration consisting of any values along both the axes.

Orthogonality in Files

Operations on Files:

Opeartions in files can be grouped in to two different sets based on the use of filename or handles in their API's. Operations that deal with filenames takes String as argument while the other group of operations uses handles as a small description of files. There are multiple ways to use handles:

a) Use Pointers/References:
+ They are simple and fast
- very unsafe as it allows an unauthorized user to read or modify the file or change its permissions.

b) Use Opaque points:
struct opaque * f;
This is not included in header file and thus this provides with an incomplete type which can not be dereferenced. Hence, they are safe to use but only for well behaved programs. As an miss-behaved program can still typecast the opaque variable and examine the individual bytes by doing:
char * p = (char * ) f;

c) Use Integers/File descriptors:
typedef int opaque;
Kernel treats actual objects as if they are very important and they they are made visible to the applications. Therefore, applications need to get permission from the kernel for obtaining any information regarding the file by making a system call.
+ Safer
- Complicated to implement
- Not type safe:

typedef int pid_t; 
typedef int signal_t; 
pid_t p  = 12; 
signal_t s = p;

Here, A program might use different types of vaiables interchangeably.

Orthogonality in Processes

Processes also have a similar concept as Files. Similar to file descriptor in files, Processes are identified by a process id. Following are the system calls presented in UNIX kernel for processes:

System Call	Description
pid_t fork()	Creates a process and yield its proces id increasing the size of process table, copies everything including all of the stack
int execvp(char * file,char * const * argv)	runs the command given in argument, destroys the process in which it is called and takes it's process id in the process table,
void _exit(int)	Moves the process status to Zombie state. The process is not destroyed
pid_t waitpid(pid_t, int*, int)	Waits for process pid to finish. After the process pi finishes, it removes the process entry from the process table by moving it from Zombie state to Empty state. Can be called on immediate child process only
kill(pid_t, int)	Sends in PID and signal number, exits the process with the designated PID, can kill any owned process

Following is a demo code for printing date that uses the above concepts:

bool print_date(void){
    int status;
    pid_t p = fork();
    switch(p){
    
        Case 0 :execvp("/usr/bin/date",(char **){"date","-u",NULL});
                _exit(127);
        Case -1 :return false;
        default:if(waitpid(p,&status,0)<0)
                    return false;
                else 
                    return W_EXIT_STATUS(status)==0;  
            }
}

Zombie processes

In the above demo code, there is race condition at line number: 10 when parent process waits for a child process to complete. Here, a child process may exit by making a system call - sys_exit(int pid) before parent even called waitpid(..) on it. So, parent still waits on a process that has already exited and thus we can not remove the child process entry from the process table. Therefore, an exit system call does not shrink the process table. Instead, it is the waitpid() system call that is responsible for collecting its child process and then remove it from the process table. Such a process, that has exited and a parent process is still waiting on it, is called a Zombie process.

In Linux, the process tree is rooted at init process with pid=1 and all the other processes are its desecendant. A process can wait only for its immediate child process. A sample tree in Linux can be as follows:

What can go wrong?

1. Neglectful Parent: A parent process might not make a call to waitpid on its children processes and therefore all the Zombie processes will never be collected and they will stay in the process table. Eventually, the process table will reach its limit and there will be no more space to create new process resulting in an error by fork system call.

2. Orphaned processes: A parent process might exit before its children processes. This problem is solved by the adoption of such orphaned processs by init. init has the following system call
while(waitpid(-1,&status,0)>=0) // -1 indicates wait for all child processes continue;

3. Missbehaved child: When a child process either miss behaves or never makes a call to SYS-EXIT, it might harm other processes or chew up all the resources. In such a case, a process can place a system call KILL(pid, int). A KILL system call can be placed by any process for any other process, doesn't have to be necessarily a parent process. This system call changes the process state on which it is called to Zombie.

Process and File Descriptor

Every process has its own file descriptor and file decriptors are also copied to a child process during fork system call.

The Big Unix Idea

- Use file descriptors as orthogonally as possible.
- Treat all files and devices the same and use same API for both of them. Primitves that apply to files, apply to devices as well but we have to carefully copy these primitives as some of the operations do not apply to some of the dvices. Example, writting to a key board is not a valid operation. This indicates that we have to give up the notion of orthognality to some extent but this is required.
Following table lists all the API's and their details for files and also lists which all can be used for all devices.

System Call	Description
int open(char const* filename, int flags, . . . )	Returns file descriptor: a nonnegative intege
int close(int fd);	Returns -1 if the file does not exist
int dup(int fd)	Create another file descriptor that points at the same place
int dup2(int oldfd, int newfd)	Copies old fd into newfd

Orthogonality between File Names and File Descriptors

int fd = open("file",O_RDWR);
if(fd>=0){
    unlink("file");
    if(read(fd,buf,buf_size)){ 
        close(fd); 
    }
}

File names and file descriptors are orthogonal in Unix. Even if you delete the file by unlinking it, you can still read the file contents through file descriptor. Its only when you close the file descriptor, the file is deleted.

What can go wrong with file descriptor?

1. Neglectful Process: Process opens a lot of files and never closes them. This can be fixed by placing a limit to the maximum numer of files that can be opened by a process.

2. Read from a removed device: If a file descriptor tries to read a device that has been removed, read will fail with the error EIO.

3. Read past EOF: If file descriptor tries to read after the end of file, read API reutrns number of bytes read as 0.

4. Read from a stream device: While trying to read from a device like keyboard, and there is not data, we can a) wait for the device b) or put file descriptor in no_delay mode, where read will fail.

fcntl Vs. ioctl : fcntl is used for higher level stuff like files while iocntl is used for lower levl stuff like devices.