CS 111

Scribe Notes for 4/10/12

by Jonathan Nguy

-----------------------------------

Continuing the code from Lecture 2 (as a paranoid programmer), we get the following


1  void main (void) {
2      unsigned long long words = 0;
3      bool in_word = false;
4      int s = 100000;
5      for (;; s++){
6           char buf[512];
7           read_ide_sector(s,buf);
8           for (int j = 0; j < sizeofbuf; j++) {
9                if (!buf[j]){
10                     write_out (words);
11                    return;  // need to get out of main
12               }
13               // checks word is alpha
14               bool this_alpha = isalpha(but[j]) !=0;
15                //false if bool is 0, true otherwise 
16               words += this_alpha & ~in_word; // we could do !in_word (which is 0, 1 as well)
17               in_word = is_alpha;
18               //~in_word : -1 or -2 (invert 0 or 1, respectively)
19               // this_alpha is 0 or 1
20          }
21     }
22     void write_out(unsigned long long n) {
23          char *p = (char*) 0xb8000 + 80 * 25 * 2/2-20;
24          do {
25               int lsd = n%10; //least significant digit of n
26               //imagine pointer to the screen memory address
27               // p[0] = lsd+'0';
28               //  p[1] = 7; // grey on black
29               *--p = 7;
30               *--p = lsd + '0';
31               n /= 10;
32          } while (n);
33     }
34  }

Memory


This segment is what we need to worry about for our program:

The memory addres 0xB8000 is specific to the screen, so if you write to this part in memory, it will display on the screen.


In general, it is a 80 x 25 grid (80 columns, 25 rows)
Represented in alphanumeric values
Each entry is represented by 2 bytes:
  • Lower order byte = ASCII character
  • Higher order byte = pick colors, blink (don't EVER use blinking text, says Eggert!)

-----------------------------------

Things we might want to do to help our program:

  • 1. performance (speed) - we will focus on this mostly
  • 2. engineering internals (easier to maintain, easier for others to understand)

Performance


For our current program, we boot fast, but we have the following problem:
  • I/O and computation (goes back and forth)
Here is the current situation:

Here are few solutions to fix that problem:

  • A. Do bigger blocks of I/O (this will be fast because there is a lot of time used for seeking and waiting)
  • B. Overlap I/O and computation
  • C. DMA
  • D. Multicore

A. Bigger blocks of I/O

- Easiest to do
- change 512 to something else (like 2048)
- go to read_ide_sector from 1 to 4

B. Overlap I/O and computation
- we have to replace read_ide_sector to read_ide_sector_start() and read_ide_sector_ready()

C. DMA - Direct Memory Access

- insl
- goes from disk -> CPU -> RAM
// wastes time of CPU to transfer bytes
- DMA
- goes from disk -> RAM (skips CPU)
// saves CPU resources

D. Multicore We did not go into depth on multicore.

Advantages of standalone programs

-great for paranoid developers
things like security software, breaks in car (very important...)

Disadvantages of standalone programs

- changes ripple through whole application
- too much of a pain to change
- hard to reuse code in other applications
- or to run two programs at the same time
- single application that is in charge
- or to recover from failures

-----------------------------------

Suppose we want to share read_ide_sector:



- The right 3 sections forms the BIOS
This works, but:
  • Must flash BIOS to upgrade software
  • Too inflexible -> all OSes are stuck with same block size

We wait an OS-specific reader (or possibly application-specific)

We are trying to build an interface between hardware and software in a way that gives us two things that we need:

  • 1. Modularity -> the wall between the SW and HW
  • 2. Abstraction -> modularity that simplifies

Modularity


WHERE we draw the line between the application vs OS:

We can make the following changes:

1           //OLD 
2           void read_ide_sector(int s, char *p);
3 
4           //NEW 
5           void read_ide_sector(off_t s, void *p); 
6                // makes interface a little more general using void* 
7                // integer = about 2 billion (which is about a TB) 
8                // off_t = 64-bit type  
9
10          ssize_t read_sector(off_t s, void *p, size_t bufsize);
11               // goes off with IDE drives 
12               // off_t is 64-bit signed 
13               // size_t is 32/64 bit unsigned 
14               // ssize_t is 32/64 bit signed… will be negative if we have an error 
15
16          ssize_t pread(int fd, off_t s, void *p, size_t bufsize);
17               // fd is an integer determine which file to read; file descriptor 
18
19          ssize_t read(int fd, void *p, size_t bufsize);
20          off_t lseek(int fd, off_t s, int flag);
21               // s = byte locations 
22               // flags: 
23                    // SEEK_START => 0 = relative to file start 
24                    // SEEK_CUR => 1 = relative to current location 
25                    // SEEK_END => 2 = relative to file end 
26               // lseek(27, -1, 1); 
27                    // go back 1 byte 
28               // returns new offset (-1 if error) 
29
30          int seek(int fd, int secno, int flag);
31              // bad API, fixed with sleek 
Many of these functions are actually being used in Linux today.
HOW to draw the line between app vs OS
RAM:

Here's a picture of a typical RAM. (The addresses are arbitrary here.)


The idea: We partition memory into regions, and then execute those regions.

Beginning of the stack is unused
- stack pointer points to the top of the stack
- stack shrinks to the right

Consider the following implementation of factorial
- (Factorial is terrible for recursion)

1      int fact( int n ) {
2          if (!n)
3               return 1;
4          return n*fact(n-1);
5     }
Here is the machine instructions for this:

1     fact:
2          pushl     %ebp
3          movl      $1, %eax
4          movl      %esp, %ebp
5          subl      $8, %esp
6          movl     %ebx, -4(%ebp) 
7          movl     8(%ebp), %ebx
8          testl     %ebx, %ebx
9          jne        .L5
10
11     .L1
12          movl      -4(%ebp), %ebx
13          movl      %ebp, %esp
14          popl       %ebp
15          ret    # returns 1
16
17     .L5
18          leal      -1(%ebx), %eax
19          movl     %eax,(%esp)
20          call      fact
21          imvll     %ebx, %eax
22          jmp      .L1
23     
24     caller:
25          pushl      $5
26          call        fact
27          addl       $4, %esp
28          result is in %eax

The stack should look similar to this:

Things that can go wrong
1. exhaust the stack
(to fix: dont use recursion!)
very rare because OS's normally fix this
2. unreliable caller --> system crashes
3. unreliable callee --> system crash

We have SOFT modularity here.
- only works because all the modules stay on their side of the boundary
don't have to worry about interference
We want HARD modularity.
- will work even if caller/callee are unreliable

---------------------END OF LECTURE---------------------










CS 111, Spring '12. Paul Eggert. April 15, 2012.