CS 111
          Scribe Notes for 4/13/10, Lecture 5
 
          by Pei-Ying Hsieh, Allen Wu 
            
          What Is Orthogonality
          Orthogonality is a property in computer systems design
          that increases the compactness of a system. The concept
          of orthogonality is such that modifying one component of
          a system neither creates nore propagates side effects to
          other components of the same system. This is comparable
          to orthogonality in mathematics. In a three dimensional
          axis, the x y z axes are perpendicular to each other.
          Given an input (x, y, z), modifying one of the inputs
          will not change another input. 
          Goals of an Orthogonal System
          We want interfaces that are: 
          
            - Simple => Making changes to one component should
            not force the user to write another component
 
            - Complete => includes necessary components
              
                - eg: process, memory, files (you can picture
                this as the 3 axes of a 3d vector space
 
               
             
            - Combinable => different combinations of x
            processes, y memory, and z files should be combinable;
            each axes should be independent of each other
 
           
          How can user programs access OS resources?
          Mechanisms: (each with their advantages [+] and
          disadvantages [-]) 
          
            - Treat OS resources as objects
              
                - Application deals with references to these
                objects
 
                - eg: process table entry structure
 
                struct pte {...} struct pte *p; 
                - + Simple => easy to comprehend
 
                - + Fast => quick to access
 
                - - No protection against bad user programs =>
                programs can (un)intentionally modify kernel
 
                - - Race conditions => what happens if two
                processes try to modify the same field at the same
                time?
 
                - - Maintainablilty => if we modify a
                structure (eg: add a field), we'd have to recompile
                the kernel AND all the applications since the size
                of the struct has changed
 
               
             
            - Access OS resources as integers (Unix
            implementation)
              
                - Use system calls to access OS resources
 
                - Opaque identifiers
 
                - eg: pid_t definition of integer type
 
                - eg: int for file descriptors
 
                - + Safe => you need to request permission
                from OS
 
                - + Maintainability => if we modify the
                kernel, we don't need to rewrite apps that use
                it
 
                - - Slower => latency of system calls
 
                - - Complex => more complicated than directly
                accessing structures
 
               
             
           
          File Descriptors
          A file descriptor is an index of a kernel's data
          structure that contains the details of all open files.
          This data structure (in POSIX) is called a file
          descriptor table. We can imagine each process having its
          own file descriptor table. (Consequently, different
          processes can have multiple files open.) In order for an
          application to access a file, it must do so through a
          system call. This will allow the kernel to access the
          requested file on behalf of the application. This
          provides a layer of protection, as an application cannot
          read from or write to the file descriptor table
          directly. 
          Example of a system call: 
          int fd = open('path', O_RDONLY, 0640); 
          // Open returns the file descriptor of the open file if
          opened successfully 
          // In this case, 'fd' is the opaque identifier 
          // Say fd = 17 after system call 
          Let's see how a process can access a file
          descriptor. 
          *Note: memory is invisible to application* 
            
          File Descriptor System Calls
          
            
              | Functions that use file descriptors
              (fd): | 
              Purpose | 
             
            
              | open() | 
              Opens a file | 
             
            
              | read() | 
              Reads from an open file | 
             
            
              | write() | 
              Writs to an open file | 
             
            
              | close() | 
              Closes an open file | 
             
            
              | lseek() | 
              Changes current file offset to a new
              position | 
             
           
          System calls are a good example of hard modularity. It
          may seem that we have objects, but these "objects" are
          hidden from the view of user programs. The system calls
          call on the kernel, which is located in a different
          place, in order to access files. 
          eg: using the open() system call: 
          int open(const char* pathname, int flags, mode_t
          mode); 
          
            - Flags
              
                - O_RDONLY => read only
 
                - O_WRONLY => write only
 
                - O_RDWR => read and write
 
               
              
                - O_CREAT => create file if it doesn't
                exist
 
                - O_TRUNC => if files doesn't exist, create
                it. otherwise, make it empty
 
                - O_APPEND => all writes go to end of
                file
 
                - You can use multiple flags by separating them
                with a pipelete. The exception is that you can only
                specify one of the first three.
                  
                    - int fd = open('path', O_WRONLY | O_CREAT,
                    0640);
 
                   
                 
               
             
            - Eggert flags (flags that Eggert wants but don't
            exist)
              
                - O_EXEC => get executable access to a
                file
 
                - O_NONE => open a file without any access
                  
                    - This is so you can use fstat()
 
                   
                 
               
             
            - Modes
              
                - Used for permissions of a file you create. Only
                used when you have the O_CREAT flag, so this
                parameter is optional
  
               
             
            - Variable number of arguments
              
                - int open(char const* name, int flags, ...)
 
                - You can pass extra arguments
 
                - Use #include <stdarg.h> to decode extra
                arguments
 
               
             
            - umask
              
                - umask() sets the calling process's file mode
                creation mask
 
                - In every process in the process descriptor,
                there is a umask (of type mode_t)
 
                - 0666 sets permissions to rw-rw-rw-
 
                - 0777 sets permissions to rwxrwxrwx
                  
                    - r => read
 
                    - w => write
 
                    - x => executable
 
                   
                  
                 
                - To get current umask, use the umask system call
                  
                    - Old umask = umask(new umask)
 
                    - If you only want to get the current umask
                    you'd change the new umask back to the old one
                    afterwards
 
                   
                 
               
             
            - If a system call fails, you can find out why by
            using #include <errno.h>
 
           
          Process Functions
          
            - fork() => create a process
              
                - fork() will clone the current process except
                for the values of the process ID, the parent's
                process ID, file descriptions (shared), accumulated
                execution times, file locks, and pending signals
                  
                
 
                - Returns -1 => fork failed (reason specified
                in ERRNO)
                  
                    - ENOMEM => no memory
 
                    - EAGAIN => error; try again
 
                   
                 
                - Returns 0 => fork succeeded, and you are
                running in the child process
 
                - Returns >0 => fork succeeded, and you are
                running in the parent process. The return value is
                the child's process ID (type pid_t)
 
               
             
            - _exit(n) => destroys a process
              
                - Input parameter is the exit status, which is
                put into the process descriptor in case any other
                process wants to know the exit status
 
               
             
            - exit(n) => destroys a process
              
                - This differes from _exit(n) in that it cleans
                up (eg: flush output buffer) before it exits. This
                may cause a hang.
 
               
             
            - getpid(void) => returns process ID of current
            process (type pid_t)
 
            - getppid(void) => returns parent's process ID
            (type pid_t)
 
            - waitpid(pit_t p, int* status, int options)
              
                - Waits for one of your children to finish (only
                works with YOUR children so there will be no
                deadlocks)
 
                - Parameter p is the process ID of the child
 
                - Status is the pointer to memory that stores the
                exit status of the child
 
                - Options indicate how long you're willing to
                wait
                  
                    - 0 => wait forever
 
                    - WNHANG => don't wait at all
 
                   
                 
                - Returns the process ID of the child that
                finished (type pid_t)
                  
                    - Returns -1 if failed (eg: tried to wait on
                    a process that isn't your child)
 
                   
                 
               
             
            - execvp(char const* file, char* const* argv)
              
                - This system call allows a process to run any
                program files which include a binary executable or
                a shell script
 
                - Calling this will destroy the current process
                and everything associated with it except the
                process descriptor. It then starts a new process in
                the current one that runs the specified
                program
 
                - Parameter file is a pointer to a character
                string that contains the name of a file to be
                executed
 
                - Parameter argv is a pointer to an array of
                character strings. You can think of its type as
                (char**), which is identical to the argv array used
                in the main program
                  
                    - int main(int argc, char** argv)
 
                   
                 
                - Returns -1 (always) because if it succeeds, it
                will never return
 
               
             
           
          System Call Example
          Imagine you want to write a function that takes input
          into stdin and outputs to stdout exactly what stdin takes
          in, but in a sorted fashion. We will do a prototype of
          this "sortIO" function. 
          
1  int sortIO(void) {
2      execvp("/bin/sort", (char*[]) {"sort", NULL} );
3  }
 
          First we try to just call execvp on the sort program.
          The problem with this code is that it will blow away the
          whole program. Recall that execvp will return destroy the
          current process. So how do we get past this? The solution
          is having the child do the execvp call. After all, the
          parent can continue normally because only the child will
          be affected by execvp. 
          
1  #include <sys/wait.h>
2  
3  int sortIO(void) {
4  pid_t p = fork();
5  switch(p) {
6      case 0:
7          execvp("/bin/sort", (char*[]) {"sort", NULL} );
8          _exit(1);
9          break;
10     case -1:
11         return -1;
12     default:
13         int status;
14         if (waitpid(p, &status, 0) < 0)
15             return -1;
16         if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)e
17             return -1;
18         return 0;
19  }
 
          Firstly, notice that we included sys/wait.h. This is
          so we can access WIFEXITED (did it exit?) and WEXITSTATUS
          (what is the exit status?). We start off by doing a fork,
          which creates a child process. Recall that the child goes
          to case 0. It then runs an execvp (because it is fine to
          terminate the child process). If p is -1, that means the
          fork was unsuccessfull. Otherwise, we are in the default
          case, used for the parent process. What the parent does
          is that it waits for its child to complete execution
          (line 14). We check to see if the wait was successful.
          Afterwards, we can check WIFEXITED and WEXITSTATUS.
          WIFEXITED returns nonzero if the process exited normally.
          WEXITSTATUS (which should only be called after checking
          WIFEXITED) gets the exit status of the process which
          exited. We make sure that the child exited properly, then
          complete the function. 
          *Note: Linux has the system call spawnvp(), which does
          fork AND exec. This is an example of orthogonality. fork
          and exec are independent of each other. This is similar
          to read vs. read + seek. POSIX supports both spawnvp and
          fork + exec. Windows on the other hand only has spawnvp.
          As a result, when it emulates UNIX programs, performance
          stinks. 
         |