Lecture 3: Modularity and Virtualization

Presented: Monday, April 7, 2014

Written By: Gautham Badhrinathan, Manoj Thakur

What we did in the last lecture ?

In the previous lecture we were working on a program that counts the number of words in a given document (proposal). We built this program to run on a bare- metal system without any operating system.

Improvements in the existing code

Improvements to IO performance:

We have the following options for improving the io performance:

Double Buffering

The basic idea behind double buffering is to not wait for the disk controller to fetch the data required by the program. This is achieved by issuing read requests ahead of time by asking the disk controller to access next buffer while processing current buffer. The program can use two buffers such that it reads from one of the buffers at a time. As soon as data is received on the first buffer the program issues another read request to the disk controller with the second buffer to be used as the result and then continues processing the current buffer. The program keeps switching between the buffers every time a request is served by the disk controller.

Where to put the read code ?

The improvements mentioned above would require re-engineering to the word count program. Moreover this read function will be tuned to a particular machine, hence we would need this to be outside the program. We have the following alternatives:

The parameters in the above function are as follows:

s: sector number a: destination buffer
ns: number of sectors to be read

The problem is that ns has an upper limit of 255

void read_sector1(int s, char *a, size_t ns);

In X86 systems size_t lets us provide more than 255 sectors. But hard disk can have different sector sizes. Hence we add the following improvement

void read_sector2(int s, char *a, size_t ns, size_t secsize);

In this definition we can explicitly specify the size of each sector using the ‘secsize’ parameter. But we need to abstract out low level details like the sector number and sector size. This is achieved by introducing the offset_from_star and nbytes arguments as follows.

void read_sector3(off_t offset_from_start, char *a, size_t nbytes);

The problem with the above definition is that we could have multiple disk drives from which we might want to read from. The disk# argument allows us to specify the disk # from which we need to read.

void read_sector4(int disk#, off_t offset_from_start, char *a, size_t nbytes);

The problem now is that we need to know the exact number of bytes read and also be able to do error handling. The return value in the function below provides the required information.

ssize_t read_sector5(int disk#, off_t offset_from_start, char *a, size_t nbytes);

The return value indicates the number of bytes read from the disk. This value is -1 if there is an error while reading.

Comparison with read(2)

Our method uses disk # where as the Unix version uses a file descriptor:

The file descriptor is used mainly because it is a uniform descriptor which can refer to any devices.

Our method uses an extra offset argument :

Unix doesn’t have an offset parameter mainly because we could read read from a streaming device as well. To seek in a random access device we could use the lseek function.

Note: There is also a pread function in Unix, which is similar to our read_sector5.

Modularity

Advantages

Disadvantages

How to enforce modularity?

What can go wrong in Soft Modularity

Example 1

char *readline(fd);

Example 2

int fact(int n) {
    if (n == 0)
        return 1;
    return n * fact(n - 1);
}

The same in assembly:

    pushl   %ebp
    movl    $1, %eax
    movl    %esp, %ebp
    subl    %8, %esp
    movl    %ebx, -4(ebp)
    movl    8(%ebp), %ebx
    testl   %ebx %ebx
    jne     L5
L1: movl    -4(%ebp), %ebx
    movl    %ebp, %esp
    popl    %ebp
    ret
L5: leal    -1(%ebx), %eax
    movl    %eax, (%esp)
    call    fact
    imull   %ebx, %eax
    jmp     L1

What is required here?

What can go wrong?

Basically, soft modularity works when there are less number of functions in the API. For a larger code base, we need hard modularity.

How to implement hard modularity ?

There are two ways of implementing hard modularity.