COM SCI 111
Lecture 2: Abstractions and Bootstrapping

by Victor Lai

Date: Wednesday, January 7, 2015


Current Events

Back to Top

In a recent article, "Adviser Guides Obama Into the Google Age" , which was published by the New York Times on January 3, 2015, a former Google executive Megan Smith tries to resolve a culture clash of the Obama Administration as she works as the CTO (Chief Technology Officer). She attempts to solve the government's technology problems and upgrade still-used, ancient technology such as floppy disks.

Problem

Back to Top

Suppose that you're a paranoid person that wishes to keep secret data safe. How do you keep the data secure?

Solution

Back to Top

Write a simple toy program (wc) that can be trusted.
The program must be secure, so we have to write our own program to ensure that. We're so paranoid to the point that we cannot trust the Linux kernel or any Ubuntu distribution.
In addition, we cannot trust any operating system out there. However, what type of hardware is going to be run on when the following is run?

wc foo.txt

We assume that the machine has the following characteristics.

The user interface works as follows.
  1. Plug the desktop into the wall.
  2. Push the power button.
  3. A 213406-word document appears on the screen.
We assume that the interface runs reasonably fast and that the size of the file is less than 1 TB. We define a word to match the regular expression '[A-Za-z]+', so the word must consist of ASCII characters.
Later, we may want to add extra features including counting the number of lines of the file, editing the file, etc.

ROM (Read-Only Memory)

When the power is turned off, the contents of SDRAM are lost. We resolve this problem by using ROM, which is able to survive power outages.
We'll take a fraction of the 1 MiB of RAM and make it EEPROM (Phoenix software).
We could have the instruction pointer at a jmp instruction to our wc program, but that would be quite expensive.

Instead, we will put the program on hard drive. The instruction pointer will point to an instruction to read the first 40 KiB, containing the program, on disk.

Chain Loading

"Save your skin by adding another level of indirection."
Chain loading basically loads a program that loads another program. The following is something like how it would look like in a shell script.

#/bin/sh
foo = bar

# Loads a different program
exec sed "s/foo/$foo/g"

BIOS (Basic Input/Output System)

The BIOS, located in EEPROM, is responsible for performing hardware self-tests, scanning for devices, looking to see if each device is bootable by looking for a specific pattern in the sector, and performing read and jump instrutions if the device is bootable.

MBR (Master Boot Record)

The diagram below shows an overview of a sector (512 bytes) of the disk.
MBR

If the BIOS cannot find the signature, no booting occurs. The partition sector identifies the starting sector, the number of sectors (size), and flags (type, bootable, etc.).
The MBR looks for VBR (volume boot record). The VBR then reads the kernel and runs the program.
Chain loading: BIOS - > MBR - > VBR - > wc program

Disks and Disk Controller

Disks typically run at 7200 rpm (120 Hz). Drive heads float by seal of disk drive.
(1/120) s/rotation = 8.333 ms/rotation
The seek time is how long it takes to move a read head from one place to another on the disk. It is typically 10 ms.
Average rotational latency = 4.166 ms
Average delay = Average seek time + Average rotational latency = 15 ms approximately
Disk Controller

The disk controller indirectly connects the disk. It has a cache that remembers data written onto disk. The CPU sends disk commands to the disk controller in the form of x86 instructions.
inb - takes in an address as an argument, grabs the byte from the disk controller, and stuffs it into that register.
outb - takes in an address, grabs the byte from the disk controller, and outputs a specified number of bytes.
Memory Layout
The BIOS reads sector into 0x7c00.

The following are the steps to read a sector from disk.

  1. Wait for it to be ready; look at 0x1f7, which is the status register.
  2. Store the number of sectors in 0x1f2.
  3. Store the sector offset - (0x1f3 - 6).
  4. Write into 0x1f7 and store a bit pattern to perform READ.
  5. Read 0x1f7 to wait for it to be ready. (15 ms)
  6. Get the results from the disk controller's cache into RAM.

Actual Code

Back to Top

NOTE: In the code below, technically we do not want the main function to be called since we are bootstrapping. We are only declaring it as a function to satisfy the C compiler.
Many of the ideas and functions used in the code below comes from the disk controller manual.


void main(void){ // boot loader
    for (int i = 0; i < 80; i++) // 40 KiB = 80 sectors
        read_ide_sector(i + 100, 0x20000 + i * 52);
    goto *0x20000;
}

void read_ide_sector(int s, int a){
    wait_for_ready();
    outb(0x1f2, 1);
    for (int i = 0; i < 4; i++)
        outb(0x1f3 + i, (s >> (8 * i)) & 0xff);
    outb(0x1f7, 0x20);
    wait_for_ready();
    insl(0x1f0, a, 128); // Copies data 4 bytes at a time
}

void wait_for_ready(void){
    while ((inb(0x1f7) & 0xc0) != 0x40)
        continue;
}

Main program for wc


void main(void){
    long long int nwords = 0;
    bool inword = false;
    int s = 90000;
    int len;
    do{
        char buf[513];
        buf[512] = 0;
        read_ide_sector(s++, (int) buf);
        len = strlen(buf);
        nwords += cws(buf, len, &inword);
    }while (len == 512);
    display_ans(nwords);
}

int cws(char *buf, int bufsize, bool *inword){
    int w = 0;
    for (int i = 0; i < bufsize; i++){
        bool alpha = isalpha((unsigned char) buf[i]);
        w += alpha & !*inword;
        *inword = alpha;
    }
    return w;
}

void display_ans(long long int nwords){
    char* screen = 0xB8000 + 200;
    do{
        screen[0] = (nwords % 10) + 0;
        screen[1] = 7; // grey on black
        screen -= 2;
    }while ((nwords /= 10) != 0);
}

As an exercise, how can the main program handle a file with all non-null bytes? Perhaps, we can have read_ide_sector return a bool.

Next lecture: How can we make the code better?