The Standalone Word Counting Program
In lecture 2, we completed a standalone application which counts the number of words in a text file.
Sample code of the standalone word counting program:
1 void main(void){
2 unsigned long nw = 0;
3 bool inw = false;
4 for(;;){
5 char buf [512];
6 int s = 21;
7 read_sector(s,buf);
8 for(int j = 0; j < 512; j++){
9 if(buf[i] == '\0'){
10 finish(nw);
11 return;
12 }
13 bool inw1 = ('a' <= buf[2])&& (buf[i] <= 'z');
14 inw += inw1 & inw;
15 inw = inw1;
16 }
17 }
18 }
Problems in the word counting program
-The word counting program implementation has multiple copies of the read_sector function in different spots of the memory (read_sector function in boot_loader in MBR and in word_count(WC).
Improvements To The Standalone Word Counting Program Performance
1) Read more sectors at once
-A
single read sector
-
Multiple read sectors
- Since time of grabbing sectors on a single sector wastes most of time in the computation, reading multiple read sectors at once saves time efficiently.
2) DMA (Direct Memory Access)
Memory access controlled by CPU
-Typical way of data transferring is controlled by CPU. The CPU commands the Disk Controller to transfer the data. However, it requires two memory transaction: Disk to CPU and CPU to RAM. In addition to inefficiency of memory transaction, CPU is not able to do work in the memory transaction.
Direct Memory Access (DMA)
-A more efficient way of data transferring is DMA (Directed Memory Access). DMA commands to disk controller to send data directly to RAM and CPU notifies when data transaction is done. This approach saves the bus traffic(once) and CPU can process other useful work during data transaction.
3) Double Buffer
Overlapped I/O & Computation
Double Buffering
-First buffer for CPU work space, and second buffer for I/O device write. Double buffer enables overlapping I/O & Computation simultaneously. Double Buffer improves the performance by roughly 2 times.
Tradeoff
-Three ways of improvement to word counting program requires reengineering WC + Boot loader to modify previously defined functions.
Modifying read_sector program
void read_sector(int s, char* a, unsigned char ns)
- Too low-level and only works with nearby disks. This is not compatible with so many different devices that are used today.
Solution
- Single copy of read sector
Steps to make read_sector more modular and achieve desirable API
0. void read_sector(int s, char* a, unsigned char ns)
-Explanation:
s is a sector number, a is a buffer to store characters, ns is the number of total sectors
A sector size is assumed to be 512B
This version is not modular enough and too closely tied to a particular hardware
-Problem
unsigned char may not be big enough to hold the number of sectors
1. void read_sector1(int s, char* a, size_t ns)
-Explanation:
size_t is unsigned int long (holding 32 bits) and thus, can hold the bigger number of sectors.
-Problem:
A sector size is arbitrarily assumed to be 512B
2. void read_sector2(int s, char* a, size_t ns, size_t secSize)
-Explanation:
A sector size is passed as an argument (secSize), so various sector sizes can be handled.
-Problem:
The function can only read one sector at a time and is not suitable for reading across multiple sectors
3. void read_sector3(int s, char* a, size_t bytes)
-Explanation:
Now, s is a byte offset from the start of a disk, and nbytes indicates the total bytes of the disk.
-Problem:
int may not be big enough to hold the byte number
The function can't be used to copy data from one disk to another
Because...
a) there is no write function is implemented yet
b) it assumes there is only one disk
c) it lacks hardware error handling
4. void read_sector4(int disk_number, off_t s, char* a, size_t nbytes)
-Explanation:
it can handle multiple disks by using disk_number as an argument to target a specific disk
off_t, which is a type of long long (holding 64 bits), is used to allow a bigger byte offset
5. int read_sector5(int disk_number, off_t s, char* a, size_t nbytes)
-Explanation:
The return value is set as int type in order to signal in case of the hardware failure (typically -1)
The return value can be the number of bytes that are successfully read to deal with a partial success
-Problem:
int may not be big enough to hold all possible return values since a return value can be the total number of bytes at maximum
6. ssize_t read_sector6(int disk_number, off_t s, char* a, size_t nbytes)
-Explanation:
ssize_t is used to allow the bigger number for a return value
Modularization in action
Recursion is a special case of function call where the caller is also the callee.
We shall examine the code for a factorial function as an example.
Factorial Example (using recursion)
C Version
Int fact (int n) {
If ( n == 0) return 1;
Return n * fact (n-1);
}
Assembly Version
fact:
pushl %ebp
movl $1, %eax
movl %esp, %ebp
subl $8, %esp
movl %ebx, -4(%ebp)
movl 8(%ebp), %ebx
testl %ebx, %ebx
jne .L5
.L1:
movl -4(%ebp), %ebx
movl (%ebp), %esb
popl %ebp
ret
.L5:
leal -1(%ebx), %eax
movl %eax, (%esp)
call fact
imull %ebx, %eax
jmp .L1
From above codes, we can say that
- %ebx stores n
- %eax stores return value
- In the recursion, the factorial function jumps to L5 to compute the next number.
However, if n == 0, it will return the base case stored in L1.
First, we jumped into fact
Click on the image to see the next step!
Caller-callee Contract
There is a caller-callee contract that is required to run the process without security problems and/or bugs
In the example of n!(factorial) function, the following contract holds:
- sp[0] = return address
- sp[1] = n (return value)
- At return, %eax holds return value
- %ebx and %ebp are constant (unchangeable)
- %esp = old(%esp) + 4
- Enough stack space must be allocated
- All stack spaces in callee must be unchanged
- Callee must set ip(instruction pointer) = sp[0] without returning to a improper location. Callee is not to be looped forever
If this contract is violated, ANYTHING could go wrong! (e.g. stack overflow)
Errors can easily propagate from callee fact(n-1) to caller fact(n).
This is known as soft modularity.
What can go wrong with the function call modularity?
How can caller mess up?
- Not save caller-save registers?
- It may forget to push arguments
- movl $0, %esp
- jmp fact
How can callee mess up?
- Can trash caller's stack
- It can loop forever
- It can return to some place else other than the location where caller has called from.
This can result in caller not having its control back.
We want
Hard modularity!
Normally, function calls can provide soft modularity. But this works only if every component is trustworthy.
Because no components work as expected, this is not scaled to large software systems.
Even with unpredictable bugs and errors, the OS should still work. There are two ways to achieve hard modularity:
I. Client-server modularity
-This method insulates the both sides of the client and server. The client code cannot step into the server and vice versa.
-This can be effective but is very expensive since each application would need separate hardware.
II. Virtualization
-This method is more realistic, and it only insulates the callee. When the program is run under virtualization, it is free to run whatever it wants.
-Virtual process communicates with the OS using the well-defined interface between the OS and Virtual Machine(VM) when the virtual process wants to access outside.