CS 111 Lecture 4 Scribe Notes (Winter 2013)

Prepared by Henry Yu and Phillip Chen


Table of Contents
Protected Transfer of Control
Write an Interpreter
Exceptions
Instruction Types
Potential Problems and How to Fix
Virtual Processor
Abstraction with Examples: Reading Memory

Protected Transfer of Control

Write an Interpreter
Pros and Cons
+It can gather performance, statistics, and correctness
+Might be for a new or different machine
-It is slow and is a CPU hog
-Memory Cost
-It might be buggy
-Its timing characteristics can differ. This can introduce or get rid of bugs

All we want is the ability to add instructions. For example:

Call Unlink(Our instruction) :

pushl filename

call unlink

addl $4 , sp

Exceptions

What we want is to use the unlink instruction. We can do this by using an invalid instruction. We will do this in the example below.

pushl filename

int 0x80

addl 0x4

Interrupt Table

-The Processor has a privileged bit. When a trap occurs, this bit is turned on as seen in the picture above.

-The invalid instruction int 0x80 causes a trap, which causes the protected transfer of control that we want.

-Interrupt Service Routine: Pushes the stack segment, $esp(stack pointer), code segment, flags, $eip(instruction pointer), and error code

-It then executes the code in the kernel, and uses reti(return from interrupt). Reti restores the stack to how it was before the interrupt.

Instruction Types

-Safe Instructions: Instructions the computer can regularly run. Example: addl, mull, call, ret

-Privileged Instructions: if you try to run these instructions, it causes invalid instructions and a trap. Excamples are insl, outb, inb, and int

Trade off with protection for the kernel from the application (one way protection)
-slower than function call
-more stuff needs to be saved
-CPU needs to work harder
-bigger hassle to use

Potential Problems & How to Fix

-We can use movl to write into the interrupt table or into the code that will be ran. Therefore the interrupt table and the code to ber an are read-only in safe mode.

Stack

-Suppose the kernel can trap? This will cause an infinite loop. We will not allow this.

Virtual Processor

-One of the components of a virtualizable processor is one where some instructions don't work or are reserved to the OS.

-Another component is memory locations are reserved. The reason for this is we don't want to be able to stomp on the kernel

Layered System

Layered System

Ring Structured OS

Ring Structured OS

Everytime a layer is passed, we need a trap.

Mechanism to Make System Calls Work

We want to make a system call to read stuff off a disk. Below is an example of how we could do this.

char buf[]1024;

insl(192m buf, 256);

Here is the assembly

movl $192, %ebx

movl $39, $eax

leal buf, $eix

movl $28, $edx

int 0x80

Disadvantages of this method
1.) It lets the application snoop other people's stuff
2.) Its too low level, not what application writes want.

Abstraction with Examples: Reading Memory

There are different approaches to reading memory. Let us consider a readline function that simply reads one line from a specified file. This simple example will illustrate some of the issues between the kernel and application, as well as some details we may encounter.

A readline function in machine code:

char *readline(FILE *f);

Machine Level:

movl $119, $eax

movl f %ebx

int 0x80

In this sample, $119 represents output of readline. Some definitions follow,

File: A high level concept represented via low level machine words.

Line: A sequence of bytes terminated by a '\n' character.

File handle/descriptor: A low level representation of a high level object.

From File's definition above, we want a way to designate which file to read in machine code. We can choose either of the two implementations listed below.

Two choices of file implementation:

A.) A pointer to the actual object.
B.) An integer handed out by the operating system.

With approach B.), the application can inspect the internals of the OS Object. To take advantage of this, we will use an integer handed out by the OS.

However, our current implementation of readline might be too high level, and there are numerous obstacles that we have to deal with in our implementation.

Concerns with our readline function:

A.) The OS must record your current location in file.

This is solved if we maintain a pointer.

B.) You might want to read a partial line.

Unsolved. Let the user worry about this.

C.) The line might be too lengthy to handle.

Return 0 and set errno to E2BIG.

D.) Non-existant '\n' in line.

Return 0 and set errno to EEOFI.

E.) **Assumes Kernel manages memory when the process should assume this position (the case in linux).

For this reason, let us consider another approach for reading memory. To satisfy this requirement, let the new approach be lower level.

read_sector(FILE f, char *buf, int num_of_sectors);

/*We use FILE over devno because device numbers are not very portable.*/

read(int file_handle, char *buf, int nbytes);

/*In this case, the operating system needs to buffer and slow down, but the tradeoff is more generality.*/

pread(int file_handle, int offset, char *buf, int nbytes);

/*Similar to read, but reads from a specified position without modifying the file pointer.*/

pread vs. read Comparison:

A.) Many applications only sequentially read files, and so read is both faster and simpler.
B.) A variable start can benefit programs like data analyses, and in this case pread is preferable.

We also have other options at our disposal. For instance, an lseek followed by read is functionally equivalent to pread.

lseek(int file_handle, int offset, int flag);


/*This function repositions the virtual file pointer, and an lseek followed by read is equivalent to pread.*/

/*Flags:
*0 = SEEK_SET: relative to start
*1 = SEEK_CUR: relative to current location
*2 = SEEK_END: relative to end
*/

lseek and read vs. pread Comparison:

UNIX originally utilized the former combination (lseek, then read). However, Oracle and Database users desired a pread function for more convenient file analyses. Today, UNIX, Linux, POSIX, etc. all have pread implemented.
Issue: This API assumes device will work. For instance, the disk could be fried midway reading a line.

Solution: read returns # of bytes read, N; lseek returns the updated offset.
N = -1: Read error
N = 0: Reached end immediately
N > 0: # of bytes read
Issue: If we use integers, then we can only access 2 GB (-231 -> 231 - 1).

int whereami = lseek(fd, 0, SEEK_CUR); ?

Possible Solution: replace int with double?

We can read 51 bits of useful info, 251 bytes, but double still can't reach 263 - 1 of info.

Possible Solution: replace int with long?

On an x64 system, a long is sufficient. This is not the case with others.

Solution: replace int with off_t.

off_t is defined in sys/types.h.

/*Usage of off_t.*/

off_t lseek(int fd, off_t offset, int flag);

off_t whereami = lseek(fd, 0, SEEK_CUR);

/*Inside sys/types.h on lnxsrv.seas.ucla.edu.*/

/*gcc -E types.h | grep off_t
*typedef long int __off_t;
*...
*/

/*gcc -E types.h | grep size_t
*...
*typedef long unsigned int size_t;
*...
*/

Issue: size_t?

size_t read(fd, buf, nbytes);

/*Incorrect usage because size_t is unsigned and read can return -1.*/

Solution: replace size_t with ssize_t (signed size_t).

ssize_t read(int fd, char *buf, size_t size);

/*Correct usage.*/

Unresolved Issue: The case if a single buffer is more than 2 GB.

-read only reads 2 GB, even if file size is greater.
-Some systems use size_t = 32 bits, ssize_t = 64 bits.
Miscellaneous Issues:

A.) Race conditions? How do we handle multiple applications' concurrent I/O?

In principle, the operating system serializes the requests, and then executes them in order.

B.) Bad pointer passed to read?

Read returns -1, errno set to EINVAL.

C.) Our focus has been disks, but read should be universal.

Give the file handler ability to read
-File on disk
-Device
-Network connection

Question to answer:

If there is network connection, but there is no data to read yet, should the system just hang? What should it do?

This approach is called synchronous I/O (or blocking I/O). This system sits idly until the network receives the requested data. In contrast, another method is to utilize asynchronous IO, where other processes can run while the transmission is continuing. This multi-tasking idea involves blocking the dependent processes like with synchronous I/O, but unrelated processes are permitted to run.