CS111 Week 2 Lecture (4/9/13)

How to cope with complexity in an operating system

by Jacob Sharf, Calvin Cam, and Ian Chen

A. Modularity

Break things into pieces

Economics of Modularity

Case 1- Say that:
- Bugs are proportional to program size S
- Cost to fix a bug is proportional to size THEREFORE: Cost to fix all bugs is proportional to S^2
Case 2- Say that:
- Program size is S and there are M modules THEREFORE: Each M is size S/M
Say that:
- (Number of bugs)/M is proportional to S/M
- Cost to fix a bug is proportional to S/M
- Cost to fix all bugs in M is proportional to S^2/M THEREFORE: Total cost to fix all bugs is proportional to S^2/M

In Case 2, we had to make assumptions though. Q: What are these assumptions? - Cost to find module is nothing - Bugs are isolated and not spread throughout the code

B. Abstraction

"Nice" pieces (simpler than original)

There are some problems with the word-count program from the previous lecture. + Had two copies of read_sector function - Bootloader in MBR - word-count itself

Q: How to fix this?

Solution 1: Put commonly used functions in BIOS (read_sector)
- BIOS = Basic Input Output Stream
- Flaw
  - BIOS specific
  - Slow
  - Generic (has to be compatible with all O.S. it is installed on)
  - Inflexible
Solution 2: Put it into MBR
- Flaw
  - MBR is only 512 bytes!
Solution 3: Create a library in RAM
- Flaw:
  - start @ 0x100000 read_sector @ 0x102000
  - tight couple between application and library address

CONCLUSION: There are many solutions with different pros, but with unfavorable flaws

TO BE CONTINUED...

BACK TO MODULARITY

wc v. crypto analysis - uses a lot of CPU


I/O  -0-   -1-   -2-
CPU     -0-   -1-   -2-
t ->

inefficient because bootloader isn't used all the time


I/O  -0- -1- -2- -3-
CPI      -0- -1- -2-
t ->

double buffering (~ twice as efficient)

Q: Can we use triple buffer? 100 buffering?
A: No, restricted by I/O (bottleneck)

read_sector -> start_read_sector -> wait_for_disk
trading complexity for performance
read_sector |------------| <--- internally, it buffers or reads ahead (cache)

CPU Bus

Briefly summarize problems with simple O.S.'s modularity

too much of a pain to reuse code
too much of a pain to change things
too hard to run multiple programs, especially simultaneously
too hard to recover from faults
read_sector is too low level; it works only with disk or disk-like devices (e.g. it assumes sectors are 512 bytes) not portable, especially those days


	
void read_sector (int s, int ptr_t a) // s is sector number, a is address
	      | generalize read_sector
	      v
int read (int s, intptr_t buf, int bufsize)
	      | improvements:
	      |      change int return to size_t turn (generalization for computer type)
	      |      change int sector type to off_t sector type (64 bit)
	      |      change intptr_t to void* (generic c pointer)
	      |      change int to size_t
	      v
size_t read(off_t offset, void* buf, size_t bufsize)
	      | potential problem: assumes random access device for offset
	      v
size_t read(void* buf, size_t bufsize) // more general and works w/ any device
	      +
off_t seek(off_t offset) // for random access devices only
	      | Add error checking
	      v
size_t lread(int fd, void* buff, size_t bufsize) // -1 => error, error has details
	      +
off_t lseek(int fd, off_t offset, int flag) // 0 => SEEK_SET, 1 => SEEK_CUR, 2 => SEEK_END

How to tell whether your modularity is good or bad

Simplicity - easy to learn and to use
Lack of Assumptions/Flexible/Portable/Neutral - easy to substitute components
Robustness - tolerate harsh conditions
Performance - * Modularity usually hurts performance (a bit, not too much) *

Policy:

Mechanism
- Method 0
  - No modularity; lots of global variables; code at random
  - Pro: fast
  - Con: hard to understand and debug
- Method 1
  - Library C functions (What are their addresses? In BIOS, In library area?). Usually library functions are stored in an agreed upon space in memory.

RAM:


			 _
			| | <- program test (read only)
			 _
			| | <- I/O buffers
			 _
			| | <- heap
			 _
			|X| <- FORBIDDEN AREA
			 _
			| | <- Stack
			 _
			| | <- library O.S. kernal (read only)

Caller + Callee Code (lazy ver: one function that calls itself)

In C


			int fact(int n)
			{
			    if(n)
				return n*fact(n-1);
			    else
				return 1;
			}

Sidenote: gcc will actually optimize this into the iterative version if you specify -O2. For the recursive version, specify the -O0 command, asking it not to perform any optimizations

In Assembly


			fact:
			    pushl %ebp          
			    movl $1, %eax        ;// caller beware! callee can modify %eax
			    movl %esp, %ebp      ;// we'll preserve %ebp, (the frame pointer)
			    subl $8, %esp
			    movl %ebx, -4(%ebp)    ;// e means extended (more below)
			    movl 8(%ebp), %ebx
			    tstl %ebx, %ebx
			    jne .L5
			.L1:
			    movl -4(%ebp), %ebx
			    movl %ebp, %esp
			    popl %ebp
			    ret             ;// answer in %eax
			.L5:
			    leal -1(%ebx), %eax ;// ax = bx-1
			    movl %eax, (%esp)
			    call fact
			    imull %ebx, %eax
			    jmp .L1

Originally, the x86 architecture used 16-bit registers, which did not have the e in the beginning of their name (ax, bp, sp, etc). As computer architectures moved to 32-bit registers, intel decided to extend their registers to 32-bits. To maintain compatibility, you can refer to the lower 16 bits of the register via >their older names, or to all 32 extended bits by appending an e. In the example above, callee can modify %eax. Caller cannot depend on %eax being preserved across a function call

This sort of caller-callee contract is called the calling convention

Q: What can go wrong with function call modularity? A: Robustness - Caller could mess up (e.g. it would forget to push in; jmp fact instead of call fact) + when we return, we'll execute random instructions - Callee can mess up (e.g. callee can set registers it's not supposed to; callee can return to wherever it wants to; callee can loop) - all these bugs happen -> we want to allow O.S. to run regardless of buggy modules

Soft Modularity

Voluntary conventions which rely on the participation of the caller and
			callee

Functional calls give you SOFT MODULARITY - We want HARD MODULARITY

Hard Modularity

Modularity which does not rely on the participation of the caller/callee.
			Works even with buggy modules.

Two Common ways:
1. Client-Server: Code is put on a centralized server. Since all communication is done over a network connection, the client connecting cannot write on server memory, and it can’t execute arbitrary instructions over a truly secure network. client-server model
- Pro: Very robust
- Con: Very Expensive

Virtualization
- Pro: Performance
- Cons: Many virtual machines (VMs) run on a single computer. If that computer has a hardware failure, it will affect more than one virtual machine (there is a single point of failure). In addition, if one faulty VM allows an attack to gain access to the machine, the security of all VMs on that hardware are compromised.