# CS 111 Lecture 3 Scribe Notes

Notes by James Eldredge

4/10/12

Overview: This lecture continues Lecture 2's scenario of the paranoid programmer determined to write a standalone program to count the words in a file. This lecture covers sample code for the standalone word counting program, the disadvantages of writing such a standalone program, and how to use modularity and abstraction to share common functions.

The Standalone Word Counting Program: In Lecture 2, we discussed how the paranoid programmer would get his word counting program to run after booting the computer, now we take a look at some sample code for the cord counting program itself:

```1  void main(){
2       unsigned long word = 0;
3       bool in_word = false'
4       int s = 100000;               //1000000 is the sector in memory where the file is stored
5       for(;;s++){
6               char buf [512];
8               for(int j = 0; j < sizeof(buf); j++){
9                       if(!buf[j]){
10                              write_out(words);
11                              return;
12                      }
13                      bool this_alpha = isalpha(buf[j])!= 0;   //the "!= 0" is redundant
14                      words += this_alpha &~in_word;              // the "&~" is equivalent to "&!"
15                      in_word = isalpha;
16              }
17      }
18  }
```
But, what about the write_out function that is called on line 10 to display the word count, how does it get the number to the screen?

Screen
There is a region in memory for writing output to the screen located at 0xb8000. The screen is represented as a 80x25 grid of two byte segments. You can write to a character to a certain part of the screen by calculating where in the 80x25 grid you want to output your character and writing to the corresponding two bytes. The low byte is an ASCII character and the high byte can control the color or even make the character blink. This method of writing to the screen is used by all computers when booting.

So, what would the write_out function look like?
It would look like this:

```1  void write_out(unsigned long long n){
2       char *p = (char*)0xb800 + 80*25-20;  //p is a pointer to the memory for writing out to the screen
3       do{
4               int lsd = n % 10;
5               *--p = 7;
6               *--p = lsd + '0';
7               n /= 10;
8       }while(n);
9  }
```
Using this code, the paranoid programmer would be able to count the words in a file using a standalone program. It will work, but what if we wanted to improve it? How would we do so?

Ways to Improve
1. Performance: speed it up
2. Engineering Internals: make it easier to maintain, read.

Performance
We may boot fast, but the code itself is not as fast as it could be. As we have set it up, the program first does I/0: fills up the buffer (the hard drive has to seek and wait). Then, the program does some computation. Then, it fills up the buffer again and the whole process starts again.
There are several ways to make this process faster:

A. Bigger Blocks: By reading in bigger blocks of memory at once, the ratio of seek and wait time to bits returned is reduced. You have to do fewer seek and waits to get the same amount of data, and because you are reading continuous regions of memory, the individual seek and wait times are not greatly increased. You can also read in from more than one sector. This would involve changing the read_ide_sector function in our code.

B. Overlap I/O and Computation (Double Buffering): Create two buffers. Read into the first. Then, read into the second while you do computation on the first. And so on. This gives a significant increase in performance only if the I/O and Computation times are similar. Otherwise, the slower of the two will limit how much improvement you get.

C. Direct Memory Access (DMA): With insl, the CPU acts as an intermediary between the disk controller and the RAM. The disk controller would contact the CPU which would in turn contact the RAM. But, with Direct Memory Access, the disk controller can communicate directly with the RAM, without involving the CPU. This reduces the time that it will take and leaves the CPU free to perform other tasks.

D. Multicore applications: Multicore Applications can perform more computations.

-Great for paranoid programmers.

-Changes ripple through the whole application
-Hard to reuse code in other applications
-Hard to run two programs at the same time
-Hard to recover from failures

Example: Take the function read_ide_sector that we used in the word counting program. Every application would have to have its own copy of the function read_ide_sector. We realize this, so how can we share it across multiple applications? Well, we could put it in the read only BIOS, but this would be like turning software into hardware. We would have to update the hardware to make changes to the read_ide_sector function. Plus, it is too inflexible. Every operating system would be stuck with the same block size (current operating systems only use BIOS routines when booting).
So, we want an operating system specific reader.

We need:
1. Modularity: separate everything into modules
2. Abstraction: modularity that simplifies. Make the path between modules as small, simple and general as possible

In our word counting example, modularity would mean separating the I/O and the computation. Abstraction would mean making a small, simple, and general way for them to communicate.

Abstraction
A good example of abstraction is reading memory:

```read_ide_sector(int s, char *p);
```
If we want to make it more general, we should not limit ourselves to characters, so how about:
```read_ide_sector(int s, void *p);
```
An integer may not be big enough, so let's try:
```Read_ide_sector(off_t s, void *p) ;     //off_t is a 64 bit type
```
We should return an error code and the sector size:
```ssize_t read_ide_sector(off_t s, void *p, int secSize);  //ssize_t is a signed 32 bit or 64 bit type
```
Why not list the file that we want to read and instead of a sector, how about a buffer (can separate up into blocks at a lower level):
```ssize_t read(int fd, off_t s, void *p, size_t bufsize),  // where fd is the file to read
```
This way the code becomes simpler, smaller, and more general. Applications calling for a memory read have to know less about what is going on at a lower level. It would be easier for many different applications to share a function like this. This is exactly what is done in linux. The function is called pread. In unix, they have two functions:
```ssize_t read(int fd, void *p, size_t bufsize); // this function just picks up the offset from where it left off
off_t lseek(intfd, off_t s, int flag); //this function allows the application to specify the offset if it wishes.
```

Modularity
For Modularity, we know we want to draw a line between the application and the operating system, the question is where to draw the line between the operation system and the operating system. Typically, this is done with function call modularity.

The Stack
RAM is split up into regions that are used for various purposes; one of these regions is the stack. The stack grows negatively. A stack pointer is maintained that points to the lowest used memory on the stack.

To look at the stack, we are going to look at a recursive function. The classic example of a recursive function is factorial. In actuality, it would be more efficient to use a loop, or even better a look up table. But, we are going to ignore this and use a recursive function:

```1  int fact(int n){
2       if(!n)
3               return 1;
4       return n*fact(n-1); //recursive because it calls itself
5  }

This c code gets transferred to x86 assembly code that looks like this:

caller:
pushl \$5                        //push 5 onto the stack
call fact                       //call the factorial function
addl \$4,%esp            //upsate the stack pointer

fact:
pushl %ebp                      //push old base pointer
movl \$1, %eax           //move 1 to eax register
movl %esp, %ebp         //base pointer = stack pointer
subl \$8, %esp           //subtract two bytes from the stack pointer
movl %ebx, -4(%ebp)     //store value of ebx one byte below ebp
movl 8(%ebp), %ebx      //ebx = two bytes above the base pointer (this is where n is stored)
testl %ebx, %ebx                //if ebx = 0
ne .L5                  // then go to .L5
.L1:    movl -4(%ebp), %ebx     //store value
movl %ebp, %esp         //set the stack pointer to the base pointer
popl %ebp                       //pop the old base pointer
movl %eax, (%esp)               //move stack pointer
call fact                       //recursively call factorial
imul %ebx; %eax         //when it returns, multiply n*return value