James Chin
Michael Policarpio
Corey Harada


CS 111 Lecture Tuesday 9/30/08 Scribe Notes

Write a program to count the number of words in a file, from scratch in machine code, can't use Eclipse or Microsoft.
1) Standalone program, when you boot machine, it boots right into your program.  Assume it is the only program that runs.
2) Must run fast.
3) We use only our own code: no help from strangers.
4) Turn on the PC, the program runs, and it prints out the number on the screen, nothing else.

Assume file is a contiguous block of data sitting somewhere on the disk, our choice of where.

What is a word? Anything matching this regex: [A-Za-z]+

Encoded in ASCII, one byte per character

CPU Architecture: Intel 32-bit x86 API (at least 3 years old)
ATA disk: 200GB (at least 3 years old)
1GB RAM
Filesize: special null byte marks the end of the file, '\0', all bits zero
Future enhancements: need to be able to run other programs too (e.g. print, word processor), also newer, bigger, faster SATA drive, printer, keyboard, mouse.

Issues:
1) bootstrapping
2) disk input
3) screen output

1) Bootstrapping:  x86 boot procedure
physical memory RAM (volatile, gone when power turned off) | ROM (contains BIOS, still exists when powering off)
hardware points at first byte in ROM, starts BIOS
BIOS tests system, looks for devices, looks for a device that contains a Master Boot Record (MBR)

MBR:
512 bytes long, divided into three pieces: 0 - 446 bytes, 64 bytes, 2 bytes (511 and 512).
last 2 bytes must look like this: 0xAA55  (55, then AA, because of little endian)
446 bytes contains x86 machine code, instructions
64 bytes is partition table, divided into four 16-byte entries.  Each entry describes an area of this disk, says status (e.g. bootable), start, size.
Chain loading: BIOS reads MBR into a location 0x7c00 in RAM.  MBR code reads Volume Boot Record (VBR) off a bootable partition & executes it.
BIOS is OS independent.  VBR is OS dependent/specific.

VBR will read our word-counting program into RAM.

Where is the OS (word counter) on disk?  Not MBR, Not VBR.  Let's say sector 1000 of the booted partition through 3047 (adjustable)

Where is OS in RAM?  Not BIOS.  Into ROM at 0xF0000. Not 0x7c00.  Put it at 0x100000 (VBR machine code will put it here, both VBR and OS knows this)

Memory:
0x7c00 is where MBR and VBR live (within 512 bytes)
From F0000 to FFFFF is where the BIOS lives.
From 0x100000 to 0x1100000 contains the OS.

Disk:
MBR partitions disk into four pieces (0, 1, 2, 3), MBR sitting at beginning of disk.
Partition 2, first 1000 bytes set aside, followed by OS.

Send this to Courseweb, don't email to professor.

2) Disk Input:
7200 RPM drive, 120 Hz
Average Seek Time 8 ms
Average Rotational Latency 4 1/6 ms
Average Transfer Rate: 500 Mb/s
Cost: $0.50/GB

Disk controller is attached to the bus.  We talk to the disk controller and give it instructions and returns us data.

Location: CPU can:
0x1F1 read this to find out controller status (store into here to issue a command)
0x1F2 # of 512 byte sectors to read
0x1F3-6 sector offset

VBR code:
for (i = 0; i < 2048; i++)
read_ide_sector(i + 1000, 0x100000 + i*512);

void read_ide_sector(int s, int a) {
while ((inb(0x1F7) & 0xc0) != 0x40)
continue; // waiting for disk controller to be ready
outb(0x1F2, 1);
outb(0x1F3, s & 0xff);
outb(0x1F4, (s >> 8) & 0xff);
outb(0x1F5, (s >> 16) & 0xff);
outb(0x1F6, (s >> 24) & 0xff);
outb(0x1F7, 0x20);
while ((inb(0x1F7) & 0xc0) != 0x40)
continue; // waiting for disk controller to be ready
insl(0x1F0, a, 128); // insert long, copies disk data into our RAM register, via CPU
}

compile this and get the machine code

MAIN OS (w.c.)
#include <stdbool.h>
#include <ctype.h>

int main(void) {
int nwords = 0; // count of the number of words
bool in_word = false;

for(;;) {
char buf[512];
read_ide_sector(s, (int)buf); // convert from char to int
for (int j = 0; j < 512; j++) {
if (buf[j] == '\0') {
nwords += in_word;
write out (nwords); // print the answer out
while (true)
continue; // after answer is printed on the screen, loop forever
}

bool this_alpha = is_alpha((unsigned char)buf[j]);
nwords += ~this_alpha & in_word; //~ is bitwise negation (NOT)
in_word = this_alpha;
}
}

Screen is 80 x 25 characters
Memory mapped I/O for screen, starting at 0xB8000, 25 lines, each contains 80 ASCII character representations
ASCII char, followed by color

void write_out(int n) {
unsigned char * screen = 0x58000 + 40*25*2 + 40; // offset output by 40 chars on the screen to the right
do {
screen[0] = n % 10 +  '0';
screen[1] = 7; // gray color on black
screen -= 2; // working backwards, every char takes two bytes to represent
n /= 10;
} while(n);


Performance Note:  Currently this program reads and counts, reads and counts.  Would be faster to read more into a buffer.
We could use DMA (direct memory access) instead of insert long (insl), to copy data from the disk controller directly to RAM, without going through the CPU.