How to organize an OS
- One approach: an Object Oriented System
- Pick a language such as Java, C++, Objective C, etc.
- But, it is too slow and does not provide hard modularity. How do we deal with this?
- Answer: Use C. This is the approach opted for by the Linux Kernel. Can use object oriented langauges such as Java and C++ to develop apps.
Objective: Hard Modularity
(More on modularity here)
- Client Server (to be covered in Lab 4)
- Virtualization (to be covered in Lab 2 and 3)
- The advatage of virtualization is that applications do not run on the actual hardware, instead they run on virtual machines, like a "sandbox." By not letting the applications run on the actual hardware, the applications are not able to trash anything. This approach mimics a "sandbox" because the app cannot affect anything outside the boundaries of the virtual machine.
- The simplest approach? Use an interpreter!
- The interpreter implements and simulates all of the neccessary functionalities of a given OS (x86 for example) needed to run the application
- Pros:
- Apps can be for some other hardware
- Apps can't escape the sandbox. The interpreter checks if an app tries to go outside their specified domain, the interpreter halts it.
- Multiple virtual machines can run on a single physical machine
- Cons:
- Performance - it's slower (If written naively, it is about 10x slower; if written well, ~1.2x slower)
Virtualizable Processor
The goal of the virtualizable processor is to protect the transfer of control. This is accomplished by not letting people execute arbitrary privileged code, but instead may only execute privileged code at specific spots.
Instruction categories:
Ordinary Instructions |
add, sub, call, ret, etc. |
Privileged Instructions |
I/O operations, halting process, etc. |
Features of a virtualizable processor
- Extra register (Am I privileged?).
- Whether an instruction is privileged or not is part of the machine state.
- The extra register indicates whether the instruction you are trying to execute is in privileged mode.
- Instructions set and clear this bit, but it defaults to 1.
- Instruction Handling
- If instruction is nonprivileged: executes as normal
- If instruction is privileged:
- If the privileged bit is set: executes as normal
- If the privileged bit is NOT set: program will TRAP
- Allows us to do virtualization without any performance loss.
Structuring virtual memory
Solely partitioning memoring into a section dedicated for apps, and a section for OS does not work. If this approach were used, an app could just jump to a location in the OS, bypassing any authorization checks. Instead, We need a Protected Transfer of Control.
- This will only let unprivileged apps access code at certain locations (restricted regions of memory so that the instruction pointer can only be set to a few certain locations in that region)
When an app tries to execute a bad (unprivileged) instruction, the instruction in not actually executed
- Instead, the hardware traps,
- By convention, there is a "trap vector" built into the hardware. It contains (again, by convention) 256 entries (256 kinds of traps). Each entry points a location in the OS code to execute when that specific trap occurs
- There is also a bit that indicates what to set the privileged bit to when the trap occurs (usually this is 1)
- The trap vector is not visible to applications. An application trying to view the trap vector triggers a trap
The Linux convention for getting the kernel's attention is to execute an invalid instruction. The interrupt instruction - INT - takes a byte that indicates which trap to execute. 128 or 0x80 indicates to the kernel that this trap was triggered intentionally. The trap vector then passes control over to the kernel, which attempts to execute the specific privileged instruction that the app was attempting to execute (the number of this specific instruction is stored in the %eax register when the call is made)
When you trap, the x86 pushes:
ss | stack segment |
esp | stack pointer (user) |
eflags | privileged bit |
cs | code segment |
eip | instruction pointer |
error |
When trying to issue a system call(eg read())
- Certain registers are set to hold the instruction number, arguments, and return location
A note on return instructions:
RET returns from the function, resetting the instruction pointer: ip=*sp++.
RETI is the converse of INT, because it pops off everything that was pushed on, and reassumes the prior state. It is a privileged instruction because it sets the privileged bit.
Operating systems can be partitioned to make sure that applications access instructions properly. The kernel can have exclusive access to privileged ones, while ordinary instructions can be accessed by everyone. There's also the idea of recursively programmed kernels using the two privileged bits in x86. These bits could be used to organize into four levels.
- k0: Core notions like memory management
- k1: Processes
- k2: Device drivers
- k3: Apps
Combining a Virtualizable Processor with an OS
- Lets us support the concept of a Process:
- A program in execution in an isolated domain (think of it as a standalone program running on a virtual machine)
- What can go wrong?
- Problem: Program issues only non-privileged instructions, never does a system call, so the kernel never takes charge (eg infinite loop)
- Solution: Change the hardware so that a trap occurs every so often no matter what (10ms perhaps)
- Problem: Program may attempt to access secrets inside other programs, such as load instructions, or even writing content
- Solution: One possibility is that while program is running, hide the other programs on disk (however, this is slow). A faster approach is to use a virtual machine.
- Problem: A program can access I/O devices
- Solution: I/O instructions, such as {inb, inst, outb} are all privileged instructions.
- This means that I/O is more expensive because of the cost of the system call
NOTE: Applications are at the kernel's mercy
Our process will have a number of (virtual) registers %eax, %esp, etc...
These are the same as real registers when running. However, when not running, these need to be saved. Where?
So we have a process table, where each row is a process
0
1
2 (UID, PIE, info about the process, virtual %eax, virtual %ebx )<-Process Table Entry(PTE)
3
4
5
During execution the entry corresponding to the running process is junk
Kernel decides when to switch process (context switch)
Processes are restricted and hence can't modify their own memory. The process descriptor table (PDT) lives in memory and hold the register values and information for all processes not at the foreground. If process 3 is running, then the values in its row in the PDT are garbage (performance reasons). We can't rely on them being accurate. If the kernel performs a context switch to another process, then the PTE for process 3 is updated to last accurate values but the values for the new process are now junk.
Unix or Posix syscalls to manipulate the process table
- _Noreturn void exit(int);
- Noreturn means its not coming back from the call, you're done after this
- Arg is between 0 and 255
- pid_t fork(void)
- You call fork once, but it returns twice (fork copies stack frames)
- Returns the pid of the child process if its in the parent process, 0 if it is in the child process
- If it fails, returns -1, sets errno
- Example of usage with differing behavior for parent and child processes
- p = fork();
- switch(p) {
- case -1: error(1);
- case 0: Child behavior
- default: Parent behavior
}
Note: Execution order of child/parent processes is not predictable/guaranteed. For example, if the parent prints "hello" and the child prints "goodby," you may get the result: "hgcololdoby."