Presented: Monday, April 7, 2014
Written By: Gautham Badhrinathan, Manoj Thakur
In the previous lecture we were working on a program that counts the number of words in a given document (proposal). We built this program to run on a bare- metal system without any operating system.
We have the following options for improving the io performance:
The basic idea behind double buffering is to not wait for the disk controller to fetch the data required by the program. This is achieved by issuing read requests ahead of time by asking the disk controller to access next buffer while processing current buffer. The program can use two buffers such that it reads from one of the buffers at a time. As soon as data is received on the first buffer the program issues another read request to the disk controller with the second buffer to be used as the result and then continues processing the current buffer. The program keeps switching between the buffers every time a request is served by the disk controller.
The improvements mentioned above would require re-engineering to the word count program. Moreover this read function will be tuned to a particular machine, hence we would need this to be outside the program. We have the following alternatives:
Perform the read function in software. Linux has implemented the read function in the kernel itself. The software implementation however needs to be improved, the iterations of which are as follows:
void read_sector(int s, char *a, unsigned char ns);
The parameters in the above function are as follows:
s
: sector number
a
: destination bufferns
: number of sectors to be read
The problem is that ns
has an upper limit of 255
void read_sector1(int s, char *a, size_t ns);
In X86 systems size_t lets us provide more than 255 sectors. But hard disk can have different sector sizes. Hence we add the following improvement
void read_sector2(int s, char *a, size_t ns, size_t secsize);
In this definition we can explicitly specify the size of each sector using the ‘secsize’ parameter. But we need to abstract out low level details like the sector number and sector size. This is achieved by introducing the offset_from_star
and nbytes
arguments as follows.
void read_sector3(off_t offset_from_start, char *a, size_t nbytes);
The problem with the above definition is that we could have multiple disk drives from which we might want to read from. The disk#
argument allows us to specify the disk # from which we need to read.
void read_sector4(int disk#, off_t offset_from_start, char *a, size_t nbytes);
The problem now is that we need to know the exact number of bytes read and also be able to do error handling. The return value in the function below provides the required information.
ssize_t read_sector5(int disk#, off_t offset_from_start, char *a, size_t nbytes);
The return value indicates the number of bytes read from the disk. This value is -1
if there is an error while reading.
The file descriptor is used mainly because it is a uniform descriptor which can refer to any devices.
Unix doesn’t have an offset parameter mainly because we could read read from a streaming device as well. To seek in a random access device we could use the lseek
function.
Note: There is also a pread
function in Unix, which is similar to our read_sector5
.
Robustness: The modular functions will provide fault tolerance and appropriate error handling in case of exceptions.
Lack of assumptions/neutrality/flexibility: The modular function will have as few assumptions as possible which would lead to less hassle to interface to.
Simplicity: Separation of concerns make it easy to learn and use the functions.
One way is to not enforce it and just have a contract. This is called Soft Modularity.
Define an API and handle errors on both ends of the API. This is called Hard Modularity.
char *readline(fd);
fd
might not point to a text file.\n
character) and hence memory can be exhausted.int fact(int n) {
if (n == 0)
return 1;
return n * fact(n - 1);
}
The same in assembly:
pushl %ebp
movl $1, %eax
movl %esp, %ebp
subl %8, %esp
movl %ebx, -4(ebp)
movl 8(%ebp), %ebx
testl %ebx %ebx
jne L5
L1: movl -4(%ebp), %ebx
movl %ebp, %esp
popl %ebp
ret
L5: leal -1(%ebx), %eax
movl %eax, (%esp)
call fact
imull %ebx, %eax
jmp L1
Basically, soft modularity works when there are less number of functions in the API. For a larger code base, we need hard modularity.
There are two ways of implementing hard modularity.
Client-Server modularity: the caller and the callee are physically separated and reside on different machines. The physical separation insulates both sides from unexpected error cases.
Virtualization: the caller doesn't directly interface with the callee, even though they reside on the same machine. This insulates the callee from any error that could be invoked due to unexpected/undesirable behavior by the caller. This technique of achieving modularity is cheaper as compared to the client server model.