CS 111 Scribe notes • January 13, 2016

OS Organization

Written by Tianyan Wu, Paul Sun, Ryan Li and Elan Markowitz for a lecture by Professor Paul Eggert

Table of Contents

  1. Introduction
  2. UEFI: Unified Approach of Booting
  3. Modularity
  4. Virtualization
  5. Process

Introduction

There are a few significant issues revolving around operating system organization that are discussed in today’s lecture. These include the need for a better system of booting, ways to enforce modularity (soft vs hard), and the benefit of a layered architecture. Two central themes of this lecture will be maximizing modularity and making systems more robust.

UEFI: Unified Approach of Booting

One significant issue we need to deal with in the operating system is booting. The booting mechanism discussed in earlier lectures is a strange low level boot from machine code. It uses machine code in the first sector of the disk and then jumps to code in another sector. This booting mechanism is far from modular. It requires both firmware and disk to be set up exactly right on each machine. It would be difficult to move a disk from one machine to another, because different machines use different methods of booting. To fix this problem, we need a standardized booting system. We call this UEFI (Unified Extensible Firmware Interface).

The approach basically asks to define booting in a disciplined way so that we can set up good interfaces between the parts of the system. UEFI is a standard firmware interface for PCs. It was designed to replace BIOS(Basic Input/Output System). It organizes the booting process by splitting it up between:

 
____________|____________ | |____Guid____| | | | | Firmware | Disk or | | | Flash | |____________|____________| |

 

GUID(Globally Unique ID) for Disk Partition

GUID is a 128-bit quantity that is used as identifier for disk partitions. The GUIDs are in the disk and flash. Since application programs are stored in disks, we need to find a easy way to load a particular program without going through everything. This is why we need disk partitions. The GUID tells us how to find a particular partition in the disk.

When we partition a hard disk, we need to let each partition has a standard layout. The file system formats are the following

However, there is a slight difficulty with these approaches. The OS cannot be in read-only memory or it could not be changed at boot. Yet, we do not want to allow the OS to be tampered with once it is running. These challenges will be addressed below.

Modularity

How do we enforce modularity after booting?

We could pick a terrible way to enforce modularity, and see what would happen. A typical way to interface to Operating System is function call. For example:

    char buf[2000];
    read(3,buf,1000);

Example: fact(n)

We could look at another function fact(n), which calculates the factorial of a given integer n.

int fact(int n) { //factorial
    if (!n)
        return 1;
    return n*fact(n-1);
}

How to make this code better? A simple way is to do return fact_table[n]. face_table is a precomputation of the factorials for integers. The problem, is that since factorials are generally big numbers, we cannot store all the factorials in integers because there could be integer overflows. So in fact, there are only 20 or so entries in the fact table.

If recursive implementation of fact is feeded into gcc, gcc will turn the code into a loop. When we turn off the optimization, we could get the following assembly code:

fact: pushq    %rbp
       movq     %rsp, %rbp
       subq     $16, %rsp
       movl     %edi, -4(%rbp)
       cmpl     $0, -4(%rbp)
       jne     .L2
       movl     $1, %rax
       jmp     .L3
    
 .L2:  movl    -4(%rbp), %eax
       subl    $1, %eax
       movl    %eax, %edi
       call    fact
    
 .L3:  imull   -4(%rbp), %eax
       leave

What can go wrong with fact(n)?

How should we address the issue of stack overflow? There are two methods:

  1. Every time we grow the stack, we check whether we have exceeded the limit. So before we do pushq, subq and call, we call check_stack_overflow. As a result, this function adds 3 conditional branches to our code.
    • But the problem is that it will make the code very slow.
  2. Use virtual memory. We could use a guard page to protect the stack. This guard page can't be read or written. So if the stack grows and for some reason hits this page, the entire program will trap.

New Problem

Earlier we thought of call fact as another instruction. Once we have stack overflow, it is not just an instruction that changes registers, it’s an instruction that can cause my entire machine to crash.

Example: fact2

Now suppose fact calls another function called fact2, and fact2 is not well-behaved as fact, then how can fact2 mess up fact? We could think of this problem in the perspective of modulization. Now we are trying to build a wall between the fact module and the fact2 module. Then how could fact2 breaks down the modulization?

On the other hand, how could fact mess up fact2?

fact could pass a garbage value to fact2 instead of the actaul argument, and fact2 will then compute the wrong number. Another problem is that before fact calls fact2, it sets the rsp register to 0. As a result, the stack pointer becomes a null pointer. If we execute the code of fact2 we will jump into the 0 forbidden zone in the memory and fact2 will trap.

As we see in this example, we have modularities for fact and fact2, but this modularity does not work very well. Modularity could only work when both sides are well-behaved. There are two types of modularities.

The example fact and fact2 is related to soft modularity. Soft modularity is a voluntary modularity. In CS 31 cs CS 32 we used soft modularity. The problem is that it requires all parts of the program to trust each other, and it doesn't scale to large applications. What we really want is hard modularity. In hard modularity, components cannot reach to other components and mess with their brains. Thus failure in one component won't affect other components.

Two techniques to enforce hard modularity

  • Virtualization
  • Client-Server
  • Client Server is a great form of modularity. It has the property that both client and server are insulated from each other. Therefore, client-server is modularity works in both directions. The problem with client-server is that it's very expensive. If we work on small applications, the cost of client-server could be greater than the cost of the code! With virtualization, we have faster, cheaper modularity, but we only have one-directional modularity. The application is not protected from the kernel. This is how the OS for most computers is set up.

    Virtualization

    How does virtualization work?

    The idea of virtualization is that the Kernel of the operating system is the master control program. It takes charge of the entire system. We want to run an application without letting the application messing up the kernel or other processes.

    One simple method for virtualization involves writing a simulator for the machine the app runs on. This simulator clones the machine, including the registers. The simulator can run the instructions of the machine by looking at them and decoding them into native instructions.

    The OS carefully checks every instructions before it is executed. It will not execute instructions that go beyond the memory bound of the machine. It won't do the operation of dividing by zero either. If the simulator sees a halt instruction, it turns that into the return from simulator to the Operating System. If the simulator sees a subscript error (access memory out of range), it would return to OS, instead of executing the invalid instruction. Meanwhile it would tell the operating system what goes wrong. The operating system can decide what to do, whether we should continue or not. If the application has an infinite loop, we could add a counting instruction for the loop. When the iterations of the loop reach the maximum, the simulator returns to the OS.

    This is very safe but there is a huge problem. It is way too slow. All this checking is slow. Executing non-native instructions takes way more instructions in a simulator. We need a solution, something that allows virtualization but with little impact on performance.

    Luckily, today’s hardware is equipped with the solution. Built into the hardware are instructions that allow for simulation of native software. While this cannot simulate any machine, it means that there are no instructions that need to be decoded and translated. We do still need to check for problems as we go. Valgrind is one of the slowest acceptable forms of virtualization. It checks instructions as it goes to make sure they are safe. However, this is still too slow for most applications.

    Again, we turn to the hardware for a solution. The solution this time is to divide instructions into safe (unprivileged) and dangerous(privileged) instructions. The dangerous instructions can only be executed from inside the kernel. There is a privilege bit inside the hardware that marks whether the program is in kernel mode or not. To enter the kernel, the hardware uses protected transfer of control. If a privileged instruction is used while the privilege bit is 0, the program traps and gives control to the kernel.

     

    modularity

     

    This only works efficiently if the privileged instructions are very rare. This means privilege has to depend on more than just instruction name. For example, movl can be a privileged instruction if it is trying to write data to somewhere it cannot access (kernel, other processes, etc.).

    This structure is known as layered architecture.

     

    layered architecture

     

    The kernel will execute system call. At hardware level, the int 128 instruction will do:

    push  ss //stack_segment
     push   esp //stack pointer
     push   eflags
     push   cs //code segment
     push   epi//instruction pointer
     ...    //error code
    
    1. int uses a trap vector: looks at index 128 because that is what the call specified
      • Changes instruction pointer into kernel
      • Kernel has a function that figures out what to do about the system call
    2. When the kernel is done: companion instruction:
      • rti: return from interrupt
      • Pops all this stuff off the stack and then resume the program: kernel could choose not to return to program if it feels you are doing something bad
      • Can resume later as well

    There is a mechanism by which you can get into kernel mode in a way an ordinary program cannot:

    Rti is the opposite of int just like return is the opposite of call. Whereas, return and call are cheap, rti and int are expensive. This is the equivalent of 5 to 6 words vs. 1 word. However, the extra expense gets extra power and gets you hard modularity.

    However, this approach is now considered a hack. We are misusing the interrupt instruction as a system call. Additionally, we should not have to do as much work for a system call as for an interrupt.

    Nowadays there is a faster way of doing system calls: (x86 and especially x86-64)

    The hardware guys saw some programs constantly do system calls. Excessive system calls were starting to eat into performance. Now we have a special system call machine instruction: syscall.

    Syscall causes the program to enter kernel. At that point, the kernel does whatever it wants to do, then returns. Hopefully, it does what the programmer wanted. The rax register holds the syscall number. This allows the kernel to figure out what the program wants to do (open, close, etc.). Arguments are put into rdi, rsi, rdx, r10 r8 and r9 registers. This is not the same order as ordinary function calls. This is just the way it is defined. If you need more than 6 arguments, you should probably redesign your API. One good solution is to use structs to store arguments. Syscall destroys rcx and r11 registers and puts the return value into rax. A return value in the range -4095 to -1 means system call failure. Again, this is not the same convention as functions.

    Ssize_t read(int fd, void *buf, size_t s){
        //Extra asm stuff here and dealing with errno
        //done at the machine level
        }
    

    Machine code is going to have copy arguments into corresponding registers for C programmer, you now have a nice function that behaves well

    Inside the kernel there is a Sysexit instruction that is the inverse of syscall. Pops all the stuff off stack, reverts back to privileged to unprivileged mode.

    We now have a mechanism for doing virtualization. Applications normally run at full speed, but when they want to do something that is “dangerous” they have to go through kernel. If for example you want to do outb instruction (privileged instruction). Issue a write system call, kernel will have to do it. This process will slows down I/O, but we get the flexibility that we need

    Now we have the mechanism, we have to figure out what to do about it. What does it look like to users?

    By users, it mean what do the application developers see? We want developers to write multiple programs that work and hook together. The fundamental model for application is process.

    Process

    Process is an application that runs atop an OS, allows us to have many applications running atop the same OS. From the point of view of process, the OS is vanished. To some extent, OS needs to support the notion of processes.

    How to create and destroy a process?

    We could create a process by doint pid_fork(void). This is a C function, and it

    We can get three kinds of return values from calling fork: 0, -1 and child pid.

    Example: Fork bombs:

    If you all login to the lnxsrv07 at the same program and you all run while(fork()) continue; Put a quota on the number of processes you can run: basically you can’t run any programs While(1) fork(); scales exponentially: parents and childs both create more child.

    How to destroy a process:

    void_noreturn _exit(int);

    If you call ordinary exit: C library looks to see what is open, it cleans it, closes files and flushes I/O processes. However, we want process to die right away without cleaning up. The extra underscore _ is to bypass cleanup. _noreturn tells you that this function never returns. It doesn’t have to use the call instruction: doesn’t have to store return address, uses the jump instruction, thus makes the code run faster. Causes the process to immediately stop running but it doesn’t destroy the process. Think of process as an object. In this object is everything you need to know about the process. E.g. Fd table, pid, exit status, etc. Exit does not destroy that object. Destroy the process with pid_t waitpid (pid t, int *status, int flags). Flags can be null: or WNOHANG flag

    ↑ Top