CS 111 Spring 2007

How NOT to enforce modularity

Typical Bootup sequence:

|---------------|
|---------------|
|---------------|
|-Boot Sector--| <- 7C00 This loads the Volume Boot Record
|---------------|
|---------------|
|-Vol Boot R.--| <- 8D00 The VBR, in turn, talks to the kernel, who loads the program
|---------------|
|---------------|
|----Kernel--- | <- C000
|---------------|

Once the program gets loaded, memory now looks like this

|---Reserved---| <-- this area is reserved for booting. It has: I/O Register, Boot Program, Kernel
|---------------|
|---------------|
|***Program**| <- 8000
|***********|
|***********|
|***********|
|---------------|
|Kernel------ | <- C000
|---------------|

x86 Machine Code

pseudo code
for (i = 0; i < volume boot record size; i++) { // The For Loop reads VBR into main memory and then

read_sector (volume origin + i, 0x80000000 + i*512) // executes the program in VBR by going to 0x80000000
sector# . . . . . . address . . . . . . sector size

}

goto 0x80000000

Who knows how to read this? Bootsector, VBR, kernel, application, EEPROM

These all have copies of the implementation of read_sector.

Example Code

write_int_to_console (int n) //this program takes a number and writes it onto the console, digit by digit

uint_16_t *p = (uint_16_t *) 0xb8014

while(n){

*p-- = '0' + n%10;
n /= 10;

}

Memory Mapped I/O. Potential bugs: Negative numbers

int fact(int n){

if(!n)

return 1;

else

return n*fact(n-1); }

*Machine Language Translation

fact:
pushl    %ebp                      // push ebp
movl      $1, %eax                 // eax = 1
movl      %esp, %ebp            // ebp = esp
subl       $8, % esp                // allocate 8 bytes on stack
movl      %ebx, -4(%ebp)     // saves callers ebp
movl      8(%ebp), %ebx      // ebx = n
testl       %ebx, %ebx           // is n zero
jne         .L5
movl      -4(%ebp), %ebx    // nestor saved sp
movl      %ebp, %esp          // nestor saved sbp
popl      %ebp                    // pops return address from stack
ret
...

Visual representation of what this does

|///////////////////////////////|
|///////UNUSED////////|
|///////////////////////////////| <---esp
|-----your frame-----|
|---------------------|
|---------------------|<---ebp
|///////////////////////////////|
|///////////////////////////////|
|///////////////////////////////|

|///////////////////////////////|
|///////////////////////////////|
|///////////////////////////////| <---esp <---ebp
|-----your frame-----|
|---------------------|
|---------------------|
|///////////////////////////////|
|///////////////////////////////|
|///////////////////////////////|

|///////////////////////////////|
|------8 bytes--------|<---esp
|///////////////////////////////| <---ebp
|-----your frame-----|
|---------------------|
|---------------------|
|///////////////////////////////|
|///////////////////////////////|
|///////////////////////////////|

Allocates 8 bytes once called

caller
    pushl      $5
    call         fact (pushes return address)
    addl      $4, %esp

5:
       leal      -1(%ebx), %eax      //eax=ebx-1
       movl      %eax, (%esp)       //stores arg
       call      fact
       mult      %ebx, %eax
       jmp       <1

|----------3-----------|
|-------fact | ebx------|
|-------fact | ebp------|
|---------6000--------|
|----------4-----------|
|-------main (ebx)-----|
|-------main (ebp)-----|
|---------5000--------|
|----------5-----------|

caller/callee contract:

Do not modify or use the stack outside of your frame
When you're done, return to ra
Result should be put in %eax
Don't mess with other registers (unless you restore them)

What happens if you don't follow these rules?

You can mess with the callers mind
You may not return to caller / go somewhere else
Overflow the stack / set sp to garbage
Callee can loop forever
Callee can execute HCF instruction
Callee can have buffer overflow

This is what we call "soft modularity". It is enforced by politeness and convention, relies on cooperation.

What we need is modularity that will work, regardless of the politeness of the person using it. This method is known as hard modularity that enforces abstraction layers.

HARD MODULARITY

       client/service: multiple computers
       called: send (fd, {"!". 5}   <-- constructed a message betweem caller / callee, w/ file/socket descriptor for communications link
       receive (fd, response) <-- buffer holding response
       if (response, opcode == "ok")
                print (response val);
       else
                print ("error");

       while(1) {
       receive (fd, request);              // fd identifies link, request is the buffer
       if (request.opcode == "!") {
             int n = request.val;
             for (int i = n; i>0; i--)
                   n* = i-1;

                  response = {"ok", n}; }
       else{
                  response = {"bad", 0};
        }
       send (fd, response); }

What are the benefits to this scheme?
+ uses hard modularity, which means you don't rely on politeness of the user

Disadvantages?
- You assume there is a good link
- No recursion
- What if the service becomes overloaded?
- Callee can loop because of this caller receive function should be receive (fd, response, timeout)
- More resources are needed to implement => not cheap enough!

HARD MODULARITY APPROACH #2

VIRTUALIZATION (you run the callee in a fake simple computer)

set up a sandbox for untrusted callees
write an x86 interpreter (trusted)
to execute an untrusted fucntion
1. copy into the interpreter
2. run the interpreter
3. interpreter traps bad access, loop. HCF (Halt Catch Fire)

=> Too slow! (can emulate different comp. though). You can get harware support for "very little" slowdown. For instance hardware actually performs "load" operation, load operation is read and checked by interpreter.

Is there any way to solve this slowdown problem? Yes!

VIRTUALIZE PROCESSOR

- Special hooks to let kernel take control when "emulated" program does something questionable: HCF

Loop (timer interrupt)

Bad Access

Process = A program in execution in an isolated domain. Underneath isolated domain there is a virtualizable processor (virtual computer).

|-----------------------------------|

| open/read Application |

|------------| |

| OS Kernel | add/mult. |

|------------|-----------------------|

| Hardware |

|------------------------------------|

- The boundary between the Application and OS Kernel with boundary between the Application and Hardware forms the Virtual Computer Interface.

- The boundary between the Application and OS Kernel is an expensive boundary so, # of calls between the Application and OS Kernel is small. However # of calls between the OS Kernel and Hardware is large, but not as large as the # of calls between the Application and Hardware.

for (;;) {

char c;

if ( sys_read (0, &c, 1) == EOF) // <--this is a very slow command, due to disk access latency

break;

process(c);

}

Speed-Up Approaches:

BUFFER CACHE => cache most-recently read sector that is 512 byte for instance, then the speed is 512 times faster
PREFETCHING => guess where program will read next, fetch that ahead of time. 32 kB prefetch means 30000 times faster speed

Speculation: Chew up otherwise-unused resources now, in hopes they'll be needed shortly.

Problem: Cache Coherence

1. Making sure cache agrees with primary copy

2. What do you do when they don't agree.