Lecture 3: Modularity and virtualization

by Min Suk Oh, Dong Jin Yu, Won Jae Lee, and Jongwoo Kim




Table of Contents



The Standalone Word Counting Program

In lecture 2, we completed a standalone application which counts the number of words in a text file.

Sample code of the standalone word counting program:

1 void main(void){
2 unsigned long nw = 0;
3 bool inw = false;
4 for(;;){
5 char buf [512];
6 int s = 21;
7 read_sector(s,buf);
8 for(int j = 0; j < 512; j++){
9 if(buf[i] == '\0'){
10 finish(nw);
11 return;
12 }
13 bool inw1 = ('a' <= buf[2])&& (buf[i] <= 'z');
14 inw += inw1 & inw;
15 inw = inw1;
16 }
17 }
18 }

Problems in the word counting program
-The word counting program implementation has multiple copies of the read_sector function in different spots of the memory (read_sector function in boot_loader in MBR and in word_count(WC).

Improvements To The Standalone Word Counting Program Performance

1) Read more sectors at once

-A single read sector
Single Read Sector
-Multiple read sectors
Multiple Read Sector
- Since time of grabbing sectors on a single sector wastes most of time in the computation, reading multiple read sectors at once saves time efficiently.

2) DMA (Direct Memory Access)

Memory access controlled by CPU
Memory access controlled by CPU
-Typical way of data transferring is controlled by CPU. The CPU commands the Disk Controller to transfer the data. However, it requires two memory transaction: Disk to CPU and CPU to RAM. In addition to inefficiency of memory transaction, CPU is not able to do work in the memory transaction.

Direct Memory Access (DMA)
Memory access controlled by CPU
-A more efficient way of data transferring is DMA (Directed Memory Access). DMA commands to disk controller to send data directly to RAM and CPU notifies when data transaction is done. This approach saves the bus traffic(once) and CPU can process other useful work during data transaction.

3) Double Buffer

Memory access controlled by CPU
Overlapped I/O & Computation

Double Buffering
-First buffer for CPU work space, and second buffer for I/O device write. Double buffer enables overlapping I/O & Computation simultaneously. Double Buffer improves the performance by roughly 2 times.

Tradeoff
-Three ways of improvement to word counting program requires reengineering WC + Boot loader to modify previously defined functions.


Modifying read_sector program

void read_sector(int s, char* a, unsigned char ns)
read sector illustration
- Too low-level and only works with nearby disks. This is not compatible with so many different devices that are used today.

Solution
- Single copy of read sector
solution picture of read_sector


Steps to make read_sector more modular and achieve desirable API
0. void read_sector(int s, char* a, unsigned char ns)
-Explanation: s is a sector number, a is a buffer to store characters, ns is the number of total sectors A sector size is assumed to be 512B This version is not modular enough and too closely tied to a particular hardware -Problem unsigned char may not be big enough to hold the number of sectors
1. void read_sector1(int s, char* a, size_t ns)
-Explanation: size_t is unsigned int long (holding 32 bits) and thus, can hold the bigger number of sectors. -Problem: A sector size is arbitrarily assumed to be 512B
2. void read_sector2(int s, char* a, size_t ns, size_t secSize)
-Explanation: A sector size is passed as an argument (secSize), so various sector sizes can be handled. -Problem: The function can only read one sector at a time and is not suitable for reading across multiple sectors
3. void read_sector3(int s, char* a, size_t bytes)
-Explanation: Now, s is a byte offset from the start of a disk, and nbytes indicates the total bytes of the disk. -Problem: int may not be big enough to hold the byte number The function can't be used to copy data from one disk to another Because... a) there is no write function is implemented yet b) it assumes there is only one disk c) it lacks hardware error handling
4. void read_sector4(int disk_number, off_t s, char* a, size_t nbytes)
-Explanation: it can handle multiple disks by using disk_number as an argument to target a specific disk off_t, which is a type of long long (holding 64 bits), is used to allow a bigger byte offset
5. int read_sector5(int disk_number, off_t s, char* a, size_t nbytes)
-Explanation: The return value is set as int type in order to signal in case of the hardware failure (typically -1) The return value can be the number of bytes that are successfully read to deal with a partial success -Problem: int may not be big enough to hold all possible return values since a return value can be the total number of bytes at maximum
6. ssize_t read_sector6(int disk_number, off_t s, char* a, size_t nbytes)
-Explanation: ssize_t is used to allow the bigger number for a return value
Difference between read_sector6 and read(fd, addr, size)
Concerns with Modularity
- Good versus bad effects of modularity
  1. Performance: Does modular perform better than unitary app?
    • Modularization generally hurts performance a bit.
    • Programs should run quickly and be memory efficient (Goal is minimize this bit).
  2. Robustness: Good modules should work well in harsh conditions & environments (meaning tolerance of faults and errors).
  3. Lack of Assumptions: Lack of assumptions that related to hardware or other codes.
  4. Neutrality & Flexibility:Components can be used in any way and different applications -> Flexibility is especially helpful when combining with abstraction.
  5. Simplicity: Easy to run and use the program.
  6. Portability: Making easy for reusing codes between programs.
How to enforce Modularity
  1. None->advantage: fast->disadvantage: hard to debug and understrand
  2. Function call Modularity
    • Define an API
    • Use it by calling C functions

When the function is called, we have to store its local variable, return address, and etc.
We use a stack to do it. A stack grows negative which goes from bottom to top. An example of the stack is below
wm
Modularization in action
Recursion is a special case of function call where the caller is also the callee.
We shall examine the code for a factorial function as an example.

Factorial Example (using recursion)

C Version
Int fact (int n) {
If ( n == 0) return 1;
Return n * fact (n-1);
}
Assembly Version
fact:
pushl %ebp
movl $1, %eax
movl %esp, %ebp
subl $8, %esp
movl %ebx, -4(%ebp)
movl 8(%ebp), %ebx
testl %ebx, %ebx
jne .L5
.L1:
movl -4(%ebp), %ebx
movl (%ebp), %esb
popl %ebp
ret
.L5:
leal -1(%ebx), %eax
movl %eax, (%esp)
call fact
imull %ebx, %eax
jmp .L1


From above codes, we can say that
  • %ebx stores n
  • %eax stores return value
  • In the recursion, the factorial function jumps to L5 to compute the next number. However, if n == 0, it will return the base case stored in L1.
First, we jumped into fact
wf0
Click on the image to see the next step!

Caller-callee Contract

There is a caller-callee contract that is required to run the process without security problems and/or bugs In the example of n!(factorial) function, the following contract holds:

If this contract is violated, ANYTHING could go wrong! (e.g. stack overflow)
Errors can easily propagate from callee fact(n-1) to caller fact(n).
This is known as soft modularity.

What can go wrong with the function call modularity?

How can caller mess up?

How can callee mess up?

We want Hard modularity!
Normally, function calls can provide soft modularity. But this works only if every component is trustworthy. Because no components work as expected, this is not scaled to large software systems. Even with unpredictable bugs and errors, the OS should still work. There are two ways to achieve hard modularity:

I. Client-server modularity

-This method insulates the both sides of the client and server. The client code cannot step into the server and vice versa.
-This can be effective but is very expensive since each application would need separate hardware.

II. Virtualization

-This method is more realistic, and it only insulates the callee. When the program is run under virtualization, it is free to run whatever it wants.
-Virtual process communicates with the OS using the well-defined interface between the OS and Virtual Machine(VM) when the virtual process wants to access outside.