Lecture 2 Scribe Notes

UCLA CS 111 - October 2, 2013 - Professor Paul Eggert

Notes by Karen Truong and Gary Ren

Table of Contents

  1. Problem
  2. Solutions
    1. Potential Solutions
    2. Best Solution
  3. Assumptions
    1. File Characteristics
    2. Machine Specs
  4. Implementation
    1. Set-up
    2. Master Boot Record
    3. Boot Loader Program
    4. Word Count Program
  5. Conclusion

Problem

A professor is writing a grant proposal and needs to count the number of words in the proposal. However, the professor is very paranoid and doesn't trust the operating system because it might contain code written by fellow professors to steal his proposal.

Solutions

The professor wants to count the words in his proposal in a simple and safe way.

Potential Solutions

Best Solution

Best solution is to not use the operating system and create a stand-alone program to count words. The program will have a very simple user interface: the user will press the power button to turn on the machine, the machine will boot to the word count program, and the word count will be printed onto the screen.

Assumptions

The following assumptions will be made when implementing the solution.

File Characteristics

Machine Specs

Implementation

A simple diagram of computer architecture is shown below to assist with understanding:

    -------         -------      
    | CPU |         | RAM |       
    -------         -------   
       |               |
============================================================= BUS    
             |                               |
          -------                   -------------------
          | ROM |                   | Disk controller |
          -------                   -------------------			
		

Set-up

Proposal file will be stored on the disk. A disk is split into multiple sectors, each storing 512 bytes. Physically, a sector can be described as an arc on the disk.

In this example the file will be stored at sector 0x10000. The file be read from the disk until EOF.

The word count program, which consists of x86 machine code, is stored at sector 0xc000.

There will also be a program to load the word count program into random access memory (RAM) and execute it. This loader program is stored at sector 0xb000.

In this example, DRAM will used. The RAM will be split up into 512 byte chunks, the same size as the disk sectors. The key idea is to pick part of RAM and making it read only memory (ROM). When the computer is started, the rest of RAM will be set to 0 but the ROM will contain information. This ROM is known as the Basic Input/Output System (BIOS). Software stored in ROM is in general called firmware.

The BIOS performs the following actions:

Master Boot Record

Most drives have a MBR located in the first sector of the entire drive. Recall from before that a disk sector is 512 bytes, so a MBR is 512 bytes as well.

A MBR is split up into three parts:

--------------------------------------------------------------------------
|Boot loader (446 bytes) |Partition table (64 bytes) |Signature (2 bytes)| 
--------------------------------------------------------------------------
		

The boot loader consists of machine code that is executed. The machine code will look for a bootable partition in the partition table and then load that partition's volume boot record (VBR) (aka partition boot record), which is the first sector of a bootable partition. The VBR will then finally load the OS kernel. One point to note is that the MBR is OS independent while the VBR is OS dependent.

The partition table consists of four entries, each 16 bytes. Each partition entry contains information about each partition of the drive, so there can only be up to four partitions for one drive. That information includes: whether the partition is bootable, starting sector of the partition, ending sector of the partition, the file system of the partition, and the total number of sectors in the partition.

The signature consists of two bytes: 0x55AA. The signature serves as a sanity check by validating the MBR's contents. The BIOS will check the signature after the MBR is loaded and output error messages if the signature is not present.

Boot Loader Program

When the BIOS loads a MBR, it copies the MBR to location 0x7c00 in RAM and points the instruction pointer to the MBR.

Going back to the scenario, if the word count program can fit in the 446 bytes of the machine code section of the MBR, then it can be actually be placed in that section and run directly from the MBR. But because the word count program is larger than 446 bytes, the MBR contains code to load the word count program into RAM and jump to it.

The code to load the program is as follows (will usually be machine code, but shown here in C for simplicity):

for(i = 0xc000; i <= 0xc058; i++)
  read_ide_sector(i, 0x100000 + (i - 0xc000) * 512);
goto *0x100000

The word count is program is stored in sectors 0xc000 to 0xc058. Thus, the above code moves the appropriate sectors to memory and then goes to that location in memory to execute the word count program.

The read_ide_sector function reads each sector into memory using the programmed I/O approach, which basically reads from (inb function) and writes to (outb function) registers of the disk controller:

//s is the sector number, a is the address in memory
void read_ide_sector(int s, int a)
{
  //waits for the disk controller to be ready
  wait_for_ready();

  // store the number of sectors (1) into the sector count register (0x1f2)
  outb(0x1f2, 1);

  //stores the offset of the sector (sector number)
  //needs multiple outb calls because only one byte allowed at a time and int contains four bytes
  outb(0x1f3, s&0xff);
  outb(0x1f4, (s>>8)&0xff);
  outb(0x1f5, (s>>16)&0xff);
  outb(0x1f6, (s>>24)&0xff);

  //store the read command (0x20) into the status register (0x1f7)
  outb(0x1f7, 0x20);

  //have to check again because writing to the controller means it's not ready anymore
  wait_for_ready();

  //at this point, disk controller has read the desired sector and stored it in the disk controller cache, but now we need to transfer it into RAM
  //insl instruction reads from cache into RAM
  //first argument is where to read from (0x1f0 is the address of the disk controller)
  //second argument is where to read to (a is the inputted address in memory)
  //third is number of four byte words (512 bytes in a sector)
  insl(0x1f0, a, 512/4);
}

The wait_for_ready function waits for the disk controller to be ready by checking the top two bits of the disk controller's status register:

void wait_for_ready(void)
{
  while((inb(0x1f7) & 0xc0) != 0x40)
  continue;
}

Word Count Program

Once loaded, the actual word count program is as follows:

#include <stdbool.h>
#include <ctype.h>
void main(void)
{
  //number of words
  int nwords = 0;
  //whether or not in middle of a word
  bool inword = 0;
  //recall 0x10000 is the sector number of the proposal file
  int s = 0x10000;
  for(;;)
  {
   char buf[512];
   // read the current sector into buffer and increment the sector number
   read_ide_sector(s++, (int)buf);
   for(int j = 0; j < 512; j++)
   {
    //if the EOF is reached
    if(buf[j] == '\377')
    {
     //print out the number of words
     writeout(nwords);
     return;
    }
    //must cast to unsigned char because that's what isalpha takes an unsigned char from 0 to 255
    bool alpha = isalpha((unsigned char)buf[j]);

    //increment word count only if the current char is a valid letter and not in the middle of a word
    //alpha && !inword does the same thing
    nwords += alpha & ~inword;
    inword = alpha;
   }
  }
}

The writeout function prints out the number of words:

void writeout(int n)
{
  //address of the computer display
  uint16_t p = (unit16_t *)0xb804;
  while(n)
  {
   //print digits from right to left
   *p-- = '0' + n%10;
   n /= 10;
  }
}

Conclusion

The paranoid professor can now count the number of words in his proposal without fear of someone stealing it. However, a huge limitation of the word count program is that it cannot be easily changed. For example, simply changing the address of where to read from will require changing the source code and recompiling.

The inflexibility of this program provides a glimpse of why operating systems are necessary.