CS 111
Scribe Notes for 4/13/10, Lecture 5
by Pei-Ying Hsieh, Allen Wu
What Is Orthogonality
Orthogonality is a property in computer systems design
that increases the compactness of a system. The concept
of orthogonality is such that modifying one component of
a system neither creates nore propagates side effects to
other components of the same system. This is comparable
to orthogonality in mathematics. In a three dimensional
axis, the x y z axes are perpendicular to each other.
Given an input (x, y, z), modifying one of the inputs
will not change another input.
Goals of an Orthogonal System
We want interfaces that are:
- Simple => Making changes to one component should
not force the user to write another component
- Complete => includes necessary components
- eg: process, memory, files (you can picture
this as the 3 axes of a 3d vector space
- Combinable => different combinations of x
processes, y memory, and z files should be combinable;
each axes should be independent of each other
How can user programs access OS resources?
Mechanisms: (each with their advantages [+] and
disadvantages [-])
- Treat OS resources as objects
- Application deals with references to these
objects
- eg: process table entry structure
struct pte {...} struct pte *p;
- + Simple => easy to comprehend
- + Fast => quick to access
- - No protection against bad user programs =>
programs can (un)intentionally modify kernel
- - Race conditions => what happens if two
processes try to modify the same field at the same
time?
- - Maintainablilty => if we modify a
structure (eg: add a field), we'd have to recompile
the kernel AND all the applications since the size
of the struct has changed
- Access OS resources as integers (Unix
implementation)
- Use system calls to access OS resources
- Opaque identifiers
- eg: pid_t definition of integer type
- eg: int for file descriptors
- + Safe => you need to request permission
from OS
- + Maintainability => if we modify the
kernel, we don't need to rewrite apps that use
it
- - Slower => latency of system calls
- - Complex => more complicated than directly
accessing structures
File Descriptors
A file descriptor is an index of a kernel's data
structure that contains the details of all open files.
This data structure (in POSIX) is called a file
descriptor table. We can imagine each process having its
own file descriptor table. (Consequently, different
processes can have multiple files open.) In order for an
application to access a file, it must do so through a
system call. This will allow the kernel to access the
requested file on behalf of the application. This
provides a layer of protection, as an application cannot
read from or write to the file descriptor table
directly.
Example of a system call:
int fd = open('path', O_RDONLY, 0640);
// Open returns the file descriptor of the open file if
opened successfully
// In this case, 'fd' is the opaque identifier
// Say fd = 17 after system call
Let's see how a process can access a file
descriptor.
*Note: memory is invisible to application*
File Descriptor System Calls
Functions that use file descriptors
(fd): |
Purpose |
open() |
Opens a file |
read() |
Reads from an open file |
write() |
Writs to an open file |
close() |
Closes an open file |
lseek() |
Changes current file offset to a new
position |
System calls are a good example of hard modularity. It
may seem that we have objects, but these "objects" are
hidden from the view of user programs. The system calls
call on the kernel, which is located in a different
place, in order to access files.
eg: using the open() system call:
int open(const char* pathname, int flags, mode_t
mode);
- Flags
- O_RDONLY => read only
- O_WRONLY => write only
- O_RDWR => read and write
- O_CREAT => create file if it doesn't
exist
- O_TRUNC => if files doesn't exist, create
it. otherwise, make it empty
- O_APPEND => all writes go to end of
file
- You can use multiple flags by separating them
with a pipelete. The exception is that you can only
specify one of the first three.
- int fd = open('path', O_WRONLY | O_CREAT,
0640);
- Eggert flags (flags that Eggert wants but don't
exist)
- O_EXEC => get executable access to a
file
- O_NONE => open a file without any access
- This is so you can use fstat()
- Modes
- Used for permissions of a file you create. Only
used when you have the O_CREAT flag, so this
parameter is optional
- Variable number of arguments
- int open(char const* name, int flags, ...)
- You can pass extra arguments
- Use #include <stdarg.h> to decode extra
arguments
- umask
- umask() sets the calling process's file mode
creation mask
- In every process in the process descriptor,
there is a umask (of type mode_t)
- 0666 sets permissions to rw-rw-rw-
- 0777 sets permissions to rwxrwxrwx
- r => read
- w => write
- x => executable
- To get current umask, use the umask system call
- Old umask = umask(new umask)
- If you only want to get the current umask
you'd change the new umask back to the old one
afterwards
- If a system call fails, you can find out why by
using #include <errno.h>
Process Functions
- fork() => create a process
- fork() will clone the current process except
for the values of the process ID, the parent's
process ID, file descriptions (shared), accumulated
execution times, file locks, and pending signals
- Returns -1 => fork failed (reason specified
in ERRNO)
- ENOMEM => no memory
- EAGAIN => error; try again
- Returns 0 => fork succeeded, and you are
running in the child process
- Returns >0 => fork succeeded, and you are
running in the parent process. The return value is
the child's process ID (type pid_t)
- _exit(n) => destroys a process
- Input parameter is the exit status, which is
put into the process descriptor in case any other
process wants to know the exit status
- exit(n) => destroys a process
- This differes from _exit(n) in that it cleans
up (eg: flush output buffer) before it exits. This
may cause a hang.
- getpid(void) => returns process ID of current
process (type pid_t)
- getppid(void) => returns parent's process ID
(type pid_t)
- waitpid(pit_t p, int* status, int options)
- Waits for one of your children to finish (only
works with YOUR children so there will be no
deadlocks)
- Parameter p is the process ID of the child
- Status is the pointer to memory that stores the
exit status of the child
- Options indicate how long you're willing to
wait
- 0 => wait forever
- WNHANG => don't wait at all
- Returns the process ID of the child that
finished (type pid_t)
- Returns -1 if failed (eg: tried to wait on
a process that isn't your child)
- execvp(char const* file, char* const* argv)
- This system call allows a process to run any
program files which include a binary executable or
a shell script
- Calling this will destroy the current process
and everything associated with it except the
process descriptor. It then starts a new process in
the current one that runs the specified
program
- Parameter file is a pointer to a character
string that contains the name of a file to be
executed
- Parameter argv is a pointer to an array of
character strings. You can think of its type as
(char**), which is identical to the argv array used
in the main program
- int main(int argc, char** argv)
- Returns -1 (always) because if it succeeds, it
will never return
System Call Example
Imagine you want to write a function that takes input
into stdin and outputs to stdout exactly what stdin takes
in, but in a sorted fashion. We will do a prototype of
this "sortIO" function.
1 int sortIO(void) {
2 execvp("/bin/sort", (char*[]) {"sort", NULL} );
3 }
First we try to just call execvp on the sort program.
The problem with this code is that it will blow away the
whole program. Recall that execvp will return destroy the
current process. So how do we get past this? The solution
is having the child do the execvp call. After all, the
parent can continue normally because only the child will
be affected by execvp.
1 #include <sys/wait.h>
2
3 int sortIO(void) {
4 pid_t p = fork();
5 switch(p) {
6 case 0:
7 execvp("/bin/sort", (char*[]) {"sort", NULL} );
8 _exit(1);
9 break;
10 case -1:
11 return -1;
12 default:
13 int status;
14 if (waitpid(p, &status, 0) < 0)
15 return -1;
16 if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)e
17 return -1;
18 return 0;
19 }
Firstly, notice that we included sys/wait.h. This is
so we can access WIFEXITED (did it exit?) and WEXITSTATUS
(what is the exit status?). We start off by doing a fork,
which creates a child process. Recall that the child goes
to case 0. It then runs an execvp (because it is fine to
terminate the child process). If p is -1, that means the
fork was unsuccessfull. Otherwise, we are in the default
case, used for the parent process. What the parent does
is that it waits for its child to complete execution
(line 14). We check to see if the wait was successful.
Afterwards, we can check WIFEXITED and WEXITSTATUS.
WIFEXITED returns nonzero if the process exited normally.
WEXITSTATUS (which should only be called after checking
WIFEXITED) gets the exit status of the process which
exited. We make sure that the child exited properly, then
complete the function.
*Note: Linux has the system call spawnvp(), which does
fork AND exec. This is an example of orthogonality. fork
and exec are independent of each other. This is similar
to read vs. read + seek. POSIX supports both spawnvp and
fork + exec. Windows on the other hand only has spawnvp.
As a result, when it emulates UNIX programs, performance
stinks.
|