CS 111 Lecture 14: Virtual Memory

March 4, 2013

by Sidney Chan

Our Next Problem: Unreliable processes with bad memory references

subscript errors
dereferencing bad pointers (null, uninitialized, freed, etc.)

We've had these problems since the 1950s.

Some possible solutions include:

Fire the programmers! (This doesn't scale.) Nobody should be writing code like this so don't do it. Unfortunately, these problems are hard to solve.
Add runtime checks to your code such as runtime subscript checking, etc. (This is fairly popular; the JVM does subscript checking by throwing an exception so you can't reference outside your bounds.)

Why can't we just do runtime checks?

Performance issues: We may want to play with pointers for performance reasons, and also runtime checking is slow.
Trust issues: We have to know if a program is written in machine code, or if runtime checking was turned off by attackers.
A workaround: If the compiler is part of the OS, we can run only the programs that were self compiled (as in the Burrough B5500 and successors).

We need help from the hardware to do subscript checking, so the hardware will check the memory references.

Simple approach: 2 extra registers, base and bounds

base points at the part of memory you're allowed to use right now
bounds points to the byte just after the one you can use

We have a hardware check for each memory access. If you access outside your bounds you trap and the kernel can decide what to do.

Problems with this simple approach:

You must have just 1 region. (Possible solution: extend our machine to have multiple base-bounds pairs.)
Programs must fit into RAM; we can't have virtual memory.

Suppose you're running a sort program and you want to run two instances at the same time:

ip1 is at address 0x10000, ip2 at 0x12000
Programs cannot contain absolute addresses, as ip2 will have a jump to 0x10000 since it is a copy of the sort code. Relative addresses will work.

To fix the absolute-address pointer, you can:

Change the way address arithmetic works (hardware) to always add the base to every address before you use it. This will work.
Mutate the program as you load it into RAM: the copy on disk has absolute addresses but you change them to relative addresses as you read the code.

Suppose the program changes the base or bounds.

Solution: Have these registers set as privileged either by setting special privileged instructions or by making accesses to these registers be privileged.

Suppose you want processes to share memory.

If you have only 2 processes you can pull this off, but this won't scale to larger numbers.

Solution: have multiple base-bounds pairs.
add a system call to create a "domain"

                        map_domain( [size], [permissions] )
                                                            x (execute)
                                                            rx (read-execute)
                                                            w (we can have this, but it's unusual)
                                                            _ (for kernel use only)
                                                            rwx
                                                            ...

These permissions need not be supported by hardware.

Traditional x86 does r, w, but not x.

One former problem your stack was read/write which meant you could execute your stack, making buffer overflow attacks easy to do. Modern x86 can give the stack r/w but not x, so it will trap for unwanted behavior.

Problem: If you have an arbitrary amount of base-bounds pairs, say 4, this will not be enough domains. There are many possible uses for a domain, including: text of a program (r), data of a program (rw), heap (rw), stack (have 1 per thread), I/O buffers, shared memory. We have many uses for a base-bound pair.

Segmentation (segmented memory)

We choose a base-bounds pair from a segment table. Privilege is needed to change the segment table. The segment number is used to identify the segment, along with an offset within the segment. Each entry in the table includes information such as its base, bounds, permissions, and so on.

Problems

What if you run out of segments?
Segmentation increases hardware complexity.
Growing a segment is expensive. This is the killer problem for segmented memory.

The kernel should arrange that programs never overlap.
More problems. Suppose we create a new object with new X(), then malloc(N) for A when the segment is full. Now we have to push C's data somewhere else. This is a pain.

Page-based Memory Management: the solution for segmented memory

Now, we fix all the pages to the same size, say 8 KiB. This way, data for a process does not all have to sit next to each other in memory, making growing and moving easier. This also solves the problem of fragmentation.
We have a magic box that stores information about the mapping between physical and virtual memory.
Each process has its own page table, with a (privileged) page register that points to a process's page table. It is up to the OS to determine how to implement a page table.

Linear page tables

(in x86, page sizes are 4 KiB, with 20 bit page numbers + 12 bit offsets)

With linear page tables, we have a page table that holds multiple page table entries (PTE). In our example, the table holds 2^20 entries of 4 bytes each, meaning that each process has a page table of 4 MiB.

We must write a function pmap that takes as its argument a virtual page number and returns the page table entry for that page number. This example function is very simple:

                int pmap (int v)
                {
                        return PAGE_TABLE[v]; // this is stored in priviliged register %cr3
                }

With linear page tables, for small processes, there will be many unused PTES in the page table. Thus, we waste a lot of RAM. To solve this, we use 2-level page tables.

2-level page tables

Here, instead of using the full 20 bits for page numbers, we use the first 10 bits for the first level page table and the next 10 bits for the next level. With two levels, we now have a more complicated pmap function.

                int pmap (int v)
                {
                        int hi = v >> 10;
                        int lo = v & (1 << 10) - 1;
                        pt *p = PAGETABLE[hi];
                        if(!p)
                                return 0;
                        return p[lo] >> 8;
                }

There are now more memory accesses, so it's slower, but we have 3*4*2^10 bytes = 12 KiB per page table, which is much smaller. However, this won't help us for large processes. To deal with this, we may want to "lie" to the hardware. We might want to put into the page table entries the image of a smaller process than a process actually is.

It is actually fairly common that a process thinks it has more memory than it really does (eg. it wants 3.5 GiB when it actually only has 100 MiB). In this process, only a portion of a process's pages are stored in main memory, with the rest in virtual memory. When a process requires data in a page that's not in RAM, you trap (since you are accessing invalid memory) and then retrieve the page from virtual memory to fix the page table.

Downsides: Very slow! If we do a lot of page faulting it will be as if we had no RAM at all. Also, when we placed the new page, what happened to the useful memory that we overwrote? We have to push the old data and save it onto disk, which requires a write + read.

Mechanism for page faulting

When we page fault, we trap and enter the kernel. The kernel inspects the faulting address.

        void pfault(int v, int access, ... )
        {
                if(swapmap(v, current_process) == FAIL) // if bad address, eg. derefence a null pointer or subscript error
                        kill(current_process);
                else // valid address
                {
                        (p, va) = removal_policy();
                        pa = p->pmap(va); // pa: physical address
                        p->pmap(va) = FAULT; // tell that the page doesn't exist anymore
                        write(pa, swapmap(p, va));
                        read(pa, swapmap(currentprocess, v);
                        currentprocess->pmap(v) = pa;
                }
        }

In the above code, p is the process holding the victim page and va is the virtual address of the victim. removal_policy() picks a victim physical page and moves it into virtual memory, causing the process to become very slow.

What can we do wrong?

Suppose the code for pfault is held in a page that becomes the victim. Memory manager pages themselves must be "wired down": never selected as victims.
Can we make secondary tables victims? Sure, if the hardware will allow us to write a 0 in the entry for that table in the top level table. You can't make the top level table a victim, however.

Virtual Memory Tuning

Demand paging (quite popular): when you start a program, load just its first page. The advantage of this is that you start up quickly, but if you only load pages just as they're needed you can cause the disk arm to move more.
Dirty bit: in your page table, you keep track (1 bit/page, every store insn can change it) of whether the page has been stored into since it has been read in. Now, if we need to get rid of a page, we know that we don't have to write it back to disk if it has not been modified.

How can we optimize the dirty bit so we don't have to store 2 positions in RAM?

We can avoid the necessity of computing the dirty bit on pages where we don't have write access.
We can store this bit in the page table entry since we have some extra room.
Even though the hardware doesn't give you a dirty bit, we can tell the hardware that a process has fewer write permissions than it really does by clearing the w bit in the PTE. There will be a trap, so the OS sets the dirty bit and w bit and returns and the process continues. The OS keeps track in its own area.