Lecture 3 Scribe Notes
by Young-kyu Choi and Mo Xu
Table of Contents
v Coping with complexity in an OS : Modularity.
v Problems in the word-count program
1. Two copies of ‘read_sector’ function
2. CPU and I/O device occupancy problem
v Modifying ‘read_sector’ program
Ø Criteria for determining the modularity of the program
Ø Possible bug scenarios with function call modularity
o Definition of modularity
Breaking complex program into pieces that is relatively independent and structured.
o Economics of modularity
Assume that the number of bugs is proportional to the program size S, and the cost of fixing a bug is also proportional to S. Then, the cost of fixing all bugs will be proportional to S2. If we divide the program into M modules, then the number of bugs will now be proportional to S/M, and the cost of fixing a bug will be proportional to S/M. Then, the cost of fixing all bugs will now be proportional to (S/M)2*M = S2/M. This is a factor of M reduction in debugging effort by modularization. But note that this is rather a simplication: in real applications, many bugs occurs across modules, and it is hard to locate in which module the bug is in.
The problem is that two copies of read_sector exists – one in boot_loader in MBR, and one in word-count (WC) itself.
One solution could be to put read_sector into BIOS. There is a standard which vendors observe so that we can expect similar interface. The problem is that it is designed be reliable. Thus, it is quite slow due to the generic functions. We need a function that has also a good performance.
Another solution could be to put it into MBR. We can make this function device-specific and performance-tuned. However, the problem is that MBR is only 512 bytes. It is impossible to put the whole function into it. We could solve this problem by creating a library in RAM. But this solution also suffers from tight coupling between application library address. A later change may significantly change the structure.
o Naïve implementation :
Consider a scheme where I/O sends data to CPU, and CPU processes it. Thus, both I/O and CPU has a large idling time. This scheme has a device occupancy problem, because either I/O device or the CPU is doing useful work at a time.
o Double buffering :
To solve the above problem, we could make CPU may work on the next chunk of data while the I/O device writes to a memory buffer. In this case, we need two buffers: one for CPU work space, and one for I/O device write. This is called double buffering, and improves the performance by roughly 2 times at the cost of 2 times increase in the memory consumption.
There is a disadvantage to this, though. It complicates the coding of all applications, because all applications have to support such scheme. That is, it sacrifices the simplicity. The alternative solution could be OS to pre-fetch data from DRAM to cache, so that the application can take advantage of the increased performance without knowing the detailed memory transaction scheme.
o Memory access controlled by CPU
When transferring the data from the disk to the main memory, the easiest way would be for CPU to explicitly control the data transfer. However, this scheme requires two memory transactions: disk to CPU and CPU to RAM. This results in waste of bus bandwidth. Furthermore, CPU cannot do anything useful in the memory transaction.
o Direct memory access (DMA)
A better solution is to let the disk controller to write directly to the main memory. It saves the bus traffic, and CPU may work on other useful things when the transaction occurs.
o Too much of a pain to reuse code (library) and make changes accordingly
o Too hard to run multiple programs simultaneously
o Too hard to recover from bugs
o First version
The below is the original 'read_sector' program we used to copy data. The original version has some problems, though. It is too low level – it only works with disks, because it assumes certain sizes of sector. This is not portable with so many different kinds of devices available these days.
o Improved version
This is an improved version of the original code. It uses more generic type size_t so that it can also deal with 64-bit machines. Also, void* is used instead of intptrct. Another improvement is that it returns the size of data it has successfully read. It is possible that the memory device was not able to read all the byte requested.
This version still has some problems. It assumes the target is a random-access device. But if the OS was accessing a stream-type device like a network device, such kind of access will not be possible. The final version addresses this issue.
o Final version
The final version uses the 'seek' function in addition to the 'read' function. This way, the read function is used to work with any device, while seek function is used for random-access devices. Seek function can address memory space with seek_set, which counts offset from the start, seek_cur, from current pointer, and seek_end, from the end of the device.
o Simplicity – easy to learn and easy to use
o Lack of assumptions / flexible / portable – easy to substitute components
o Robustness – tolerate harsh conditions
o Performance – modularity usually hurts performance, but not too much
o C Code:
Consider a simple recursive function that does factorial. Note that the optimization has been turned off to better observe function calls.
o Generated assembly code:
o Caller problem
It may forget to push n into the stack. It may also jump back without restoring the stack pointer.
o Callee problem
It is possible for callee to sets registers that it is not supposed to. Also, it may return to a wrong place and caller will not have control back. Moreover, it may loop forever.
Function calls can give soft modularity, but we want hard modularity. That is, the OS should work even with buggy modules. One way of achieving modularity is to use client–server model. The client code can’t step on server and vice versa. It works well, but very expensive since it would require us to have separate hardware for each application.
More common way is to use virtualization. The program would run on a virtual machine, and be free to run whatever it wants within a code. When it wants to access outside, it has to communicate with OS in a well-defined interface.