This lecture begins by covering a way of reducing the complexity of an operating system: modularity. There is soft modularity and hard modularity.
We could enforce modularity after booting by the way of function calls. (This is soft modularity and could be a terrible idea.)
Suppose you have the following C code
char buf[2000];
read(3, buf, 1000);
Lets also look at a recursive factorial function
int fact(int ) {
if (!n)
return 1;
return n * fact(n - 1);
}
Turning off gcc optimization, this code generates the following assembly instructions:
fact: pushq %rbp //
movq %rsp, %rbp //all part of pushing onto the stack
subq $16, %rsp //
movl %edi, -4(%rbp)
cmpl %0, -4(%rbp)
jne L2
movl $1, %eax
jmp .L3
.L2 movl -4(%rbp), %eax
subl $1, %eax
movl %eax, %edi
call fact
imull -4(%rbp), %eax
.L3 leave //rbp=*rsp++ pop both part of popping off the stack
ret //rip=*rsp++ pop
Here are things that can go wrong with this code.
fact(-1); // n will never reach the base case
fact(INT_MAX); // stack overflows
fact(INT_MIN); // n goes to INT_MAX, overflow occurs
fact(50); // causes positive overflow
Normally, we can treat a call instruction as a single machine instruction, but now we have to worry that it could crash our program!
Suppose we have another function, fact2. What can fact2 do to mess up fact?
- It could change fact's return address.
- it could loop forever
- Change a variable in fact.
- It could cause an unwanted jump
What can fact do to mess up fact2?
- Put a random value into %rdi, an argument register.
Thus, modularity y function calls works only if the caller and callee are well-behaved. This is an example of "soft modularity". It doesn't scale to large applications.
What we want for large applications is "hard modularity", where one component canot modify other components.
Here are two ways to achieve this:
1. Virtualization <====== This is what we will focus on today
2. Client-server messages
Client-server gives hard modularity in both directions, but it is expensive.
Virtualization is cheaper and faster. But it is unidirectional modularity: An application is not protected against the Kernel; i.e. A buggy kernel == disaster.
The simplest way for virtualization is to write a simulator for
the machine the app runs on.
The simulator carefully checks for instructions halt.
If there's
a halt, return. If we are out of bounds of the range of addresses,
return.
To check for an infinite loop, we can have an instruction counter;
if the count reaches a MAXIMUM, return.
The problem is that this is too slow for production, usually.
To make this faster, we needa virtualizable processor. This works
if our OS uses the same instructions as the simulated machine.
We divide instructions into safe (unprivileged) and unsafe (privileged). And we have a protected transfer of control: the kernel is in charge of privileged instructions, and the app does not execute them. We maintain a privileged bit. 0 is for the app. 1 is strictly for the kernel. Privileged istructions in the application should be rare. When an app uns into a privileged instruction, it traps, and rip goes to the kernel to continue execution.
classic way to enter the kernel: execute a privileged instruction with a
standard convention. Interrupts.
int 128 128 means execute a system call.
By the way...
analogy:
rti:int :: ret:call (rti is return from interrupt)
rti:int - expensive, but hard modularity
ret:call - cheap
Nowadays, there is a faster way to make system calls.
(This is for both x86 and x86-64, but we'll look at the latter specifically)
syscall - is a single instruction
%rax contains the syscall number (for open, close, etc.).
The arguments are in the following registers:
rdi, rsi, rdx, r10, r8, r9
It trashes rcx and r11
Result is in rax.
Notice that the arguments follow a different calling convention
than for ordinary call.
The numbers between -4095 and -1 mean failure errno.
ssize_t read(int fd, void* buf, size_t s) {
...asm...
...deal with errno...
}
This mechanism works. Now what does the user (app developer) see? The model: program running in a virtual machine atop an OS.
creation
pid_t fork(void);
clones the current process.
returns 0 as child.
returns childpid in parent.
-1 on failure.
We have a failure possibly because we don't have enough resources.
Watch out for fork bombs.
while(fork())
continue;
or even deadlier:
while(1)
fork();
destruction
void _Noreturn _exit(int);
ordinary exit(); cleans up.
_Noreturn jumps instead of calls. No saved rip.
pid_t waitpid(pid_t, int* status, int flags);