Operating Systems CS111 Lecture 4 (Oct 7th, 2008)

By: Gene Auyeung, Yanshu Fan, Albert Hsieh, Matt Schuerman

An Operating System's Application Programming Interface

To implement hard modularity, part of the CPU's instruction set is restricted—only executable when a supervision bit is turned on—for example, HALT or INB. This is because these instructions can be misused. To prevent applications and users (who we don't trust) from doing any damage, applications are run with the supervision bit turned off. The operating system (which we trust) on the other hand, has control of the computer and runs with full priviledges. When applcations need something done that requires executing restricted instructions, they have to do so through a system call.

App-OS-Hardware layers

To do a system call, the application executes the INT (interrupt) instruction. The INT instruction takes an operand that tells the OS what system call it is asking for. The INT instruction causes a trap: the application loses control of the Program Counter, the Program Counter jumps to an address determined by the INT operand and the Interrupt Address Table, the supervision bit in turned on, and the OS in now in control. The OS saves the registers before it does anything so it can restore them laterbefore returning control to the application. It pushes the following (among other things) onto the Kernel Stack:

The result of the system call (if any) will be put into a pre-agreed-upon register. Other registers are restored to the way they were before the interrupt. In a sense, from the application's point of view, system calls extend the instruction set of the machine.

Rings of Protection

The rings represent a hierarchy of priviledges, labelled in ascending order starting from 0 (the most trusted) to 3 (least trusted).

Rings of Protection

Typically, the kernel of an operating system runs in level 0. Because of this, the kernel should be bug-free; to reduce the complexity of the kernel (and the number of bugs), the kernel typically only manages memory and threading. Device drivers run at level 1. Standard Libraries run at level 2. User code runs at level 3.

The level is part of the Interrupt Address Table, which tells the processor what priviledges the code should be run at.

This many level of protection offers fault tolerance and security, but Unix uses only two levels. Level 0 includes kernel code and device drivers, while level 3 includes libraries and user code. The reason for this model is mostly performance. Everytime when an application needs to access certain resources outside of ring, it has to do an interrupt, which pushes all sorts of registers onto main memory, then it is loaded back to the CPU. Less levels mean more cost-effectiveness.

In the typical model, when code running at a level fails (e.g. throws an exeption) it affects the levels above, but not below. But another reason for the Unix model is that any major faults, no matter if they are recoverable or not, will result in a much-degraded system for the user.

How may an unreliable program break systems?

  • Dangerous instructions, e.g. HALT / INB / RETI etc.
    • Solution: trap (report illegal instruction)
  • Overflow the stack (access memory that the program are not supposed to)
    • Solution: memory protection
  • Access a device (via INB)
    • Solution: programs can only access I/O with system calls
  • Loop
    • Solution: clock interrupt, kernel take over control periodically
  • Interrupt overflow
    • Solution: priorities in interrupts
  • Kernel bug

Two ways to implement system calls

  • Method #1. Function calls
    + Faster
    + Less power consumption
    - No protection
  • Method #2. Traps / system take over
    + Protection
    - Slower

OS organization using virtualization

OS organization pic

Process Descriptor Table

The process descriptor table is an array stored in the kernel memory that essentially holds the virtual registers of all processes not currently running. This data includes the stack pointer (%esp), the instruction pointer (%eip) and the pointer to the virtual memory table. Seen below is a diagram showing how the process descriptors are stored. Each process is stored in order based on its process id (pid_t).

Process Descriptor Table Diagram

Virtual Memory

The virtual memory allows a process to see a contiguous segment of memory that is actually spread out around the physical memory. Each process owns its own virtual memory table (whose pointer is stored in the process descriptor table) that maps a virtual address to an actual physical address. The operating system segments each processes' code, data, heap, and stack into smaller segments called "pages", which are then stored separately into the physical memory. In this way, multiple processes can be stored throughout the physical memory without having to actually be stored continuously. Seen below is a diagram showing the layout of processes in relation to its physical memory.

Virtual Memory


Claim 1: Use syscalls for all input/output
Counterexample: Graphics need to access display without having to constantly use system calls. So: use syscalls to set up access.

Claim 2: Single interface works well for all devices because "they're all alike" at some level

Virtual Memory

Unix's Big Idea: One Interface For All Devices

One of the most fundamental functions of an operating system (OS) must perform is to give applications access to the resources of the computer. However, different computers can have quite different hardware, so it becomes a challenging engineering problem to give applications access to these resources in a uniform fashion. Unix's big idea is to provide access to all hardware devices using the same interface it does for files: file descriptors.

File Descriptors

Unix uses file descriptors as an abstraction for doing I/O operations on files. When a file is created or opened Unix returns a file descriptor associated with that file. While the file remains open manipulating this file descriptor become equivalent to manipulating the file itself. File descriptors provide a simple interface for reading data from files, writing data to them, or moving about within them. The Unix application interface protocol (API) provides 3 main functions to perform these tasks:

  1. ssize_t read(int fd, char *buffer, size_t sizeBuf)

    This function reads data from the file descriptor fd and places it in the memory pointed to by buffer. The sizeBuf argument specifies the maximum amount of data which can be written to memory at buffer. If the function executes correctly the number of bytes read will be returned.

  2. ssize_t write(int fd, const char *buffer, size_t sizeBuf)

    write() writes data to the file descriptor fd. The function reads sizeBuf bytes from the memory pointed to by buffer and writes that data to fd. If no error occurred the function returns the number of bytes written.

  3. off_t lseek(int fd, off_t offset, int flags)

    This function changes the file descriptor fd's "position" in the file. Both the read() and write() require a position within the file to work. If the file is longer than sizeBuf bytes the read() function must know which sizeBuf bytes to read. Similarly the write() function must know where in the file to write data to. This function allows one to shift the current position in the file by offset bytes. The flag argument specifies the starting point of the shift. There are 3 possible values:

    If no errors occur the function returns the number of bytes the position in the file was advanced. This can be less than offset if the end of the file was encountered and flags was not SEEK_END.

All 3 of these functions return -1 and modify the value of errno if an error occurred during execution. The errno value indicates the nature of the error and the values it takes are standardized in the Unix API.

Models for OS Resources Within Applications

Most hardware devices exist to read or write data in a more human-friendly format, so extending the file descriptor API to cover hardware as well seems like a logical and elegant solution to the problem of give applications standardized access to operating system resources. Using the file descriptor abstraction one can now simply think of the keyboard, mouse, CD-ROM drive, monitor, etc. as special files. However there is still a question of how much of an abstraction file descriptors should be. A file descriptor could simply be a memory address: an application writes to that address and something happens. Such a low-level access would have very little overhead for Unix and be extremely simple, so much so the above file descriptor API might not even be needed. However, this would also give little security cause race conditions if multiple applications where using the same file descriptor. A better solution is to make file descriptors opaque handlers. Another object, a file object, does all the actual work of manipulating the file. But all an application sees is an opaque reference to the file object: a file descriptor. An application is constrained to use the Unix API with the file descriptor and it always separated from the file object itself. While this requires more overhead, it gives greater security and guards against low-level race conditions if multiple applications use the same file or device. For these reasons Unix uses the opaque file descriptor model.

Unix File Descriptor Conventions

File descriptors can be implemented in many ways. This section covers the way that Unix choose to implement it's file descriptors. It first provides an overview of how Unix keeps track of file descriptors internally and then moves on to outline several important Unix conventions regarding file descriptors.

Unix Internal Tracking of File Descriptors

Unix file descriptors have type int. This int that Unix uses as a file descriptor is an index into the file descriptor table. By convention initially index 0 is always stdin, 1 is always stdout, and 2 is always stderr in the table. The file descriptor table itself contains pointers to the file objects which do all the resource handling. Each process (a process being an instance of an application) has it's own file descriptor table which is pointed to by an entry in it's process description table. If a process calls fork() it's file descriptor table is copied from the parent's file descriptor table with the exception of the first 3 indexes (stdin, stdout, stderr). If a process call execvp() the process retains it's file descriptor table.File Descriptor Table Diagram

Streaming vs. Non-Streaming

File descriptors where initially designed to work with files, which are non-streaming. However, many hardware devices such as keyboards, trackballs, and network connections are streaming. Data comes in at a constant rate and it is impractical to save more than a small piece of it (that data which just came in) at a time. For such devices the lseek() function makes little sense. By convention Unix does not prevent an application from calling lseek() on such devices, but the call will always return -1. A good test to see if a file descriptor is associated with a streaming device is to call lseek() on it and see if it fails for no apparent reason.

File Descriptor File Position Tracking

In Unix several processes can access the same file object at once or in sequence using different file descriptors. If all those processes wish to do reads and writes in a synchronized manner a coonvention must be chosen: either the file descriptor can keep track of the process's position in the file or the file object can keep track of the process's position in a file. Both ways have their advantages. If the file descriptor keeps track of the process's position in the file, then several processes can read from different places in the file at once (ignore the race condition of several processes writing at once). However, once a process has finished and it's file descriptors are freed the position information is lost. If the file object keeps the current position in a file then all the processes have to share that position, but the position stays the same if any one process exits. Unix goes with the latter convention because functions like fork() and execvp() make more sense if they begin reading or writing information where the parent process left off.