CS111 Scribe Notes for 11/20/13:

Virtual Memory

by Sandra da Silva

Brief History

In the 1960's, IBM sold a machine with the idea that it had 128KiB of RAM vs 16MiB of virtual memory.

Problems with this included:

- Performance was not quite as good as if a machine had this much real memory

- Virtual memory relies on LOCALITY OF REFERENCE (the tendency of memory references to cluster)

- THRASHING, which occurs when paging I/O dominates. This means operations can take much longer than if we were in main memory (ex: 10ms vs 1ns). With thrashing, there is locality of reference, but not enough to fit in RAM.

Graph of Additional Utility Gained by Using Virtual Memory

DrawObject

Common Problem with Buggy Software: Buggy software often has lots of unreliable processes with bad memory references.

Examples of bad memory references are:

a[i], where i is out of range
*p, where p is NULL or p is freed
p is (char*) i, where i is a random int or pointer
{char a[n];} (where n is a huge number, such as 10000000000)

- This is implemented via subl $10000000000, %esp

Solutions to this problem:

0) Fire the programmer who wrote the buggy code. (This is not efficient in practice.)

1) Every memory access is a syscall. The OS knows this because it has a per-process table.

2) The compiler can insert code around every access that checks if the pointer is valid.

2a) How does the compiler know the index range of all arrays, etc?

- Better languages, like Java, store the size of arrays. Java has no pointers, which catches 60-80% of errors. However, its runtime checks hurt performance.

2b) gcc -fsanitize = address

(malloc/free/realloc)

This compiler option uses shadow memory.

Relationship between shadow memory and real memory:

There is a single byte in shadow memory for each 8-byte chunk in real memory. The shadow memory is set to 1 if the corresponding real memory is being used, and 0 if the corresponding real memory if free. In order to check if a pointer is pointing to a valid memory location, its shadow memory location is computed and it is checked whether corresponding shadow memory location is 0 or 1. In order to access the shadow memory location that will tell whether or not a pointer p has a valid real memory location, (shadow + (p >>3)) is used.

3) A per-process base-bounds pair can be used, which limits damage to other processes. Every memory access “a” is limited such that L <= a <U. This is enforced by hardware. Privileged registers rL and rU are used. Each process has its own values of L and U. This way, processes can't step on each other's work or the kernel.

Relevant Base-Bounds Pair Diagrams:

In order for this to work, we assume that we have position-independent code (PIC). PIC doesn't care where it is in RAM. Compile PIC with gcc -fPIC.

Using per-process base-bounds pairs has several problems. There is external fragmentation, the whole process has to fit in RAM at once, and size has to be predicted at the start. In addition, running processes can't be relocated unless all references are relative to L. Buffer overrun can still occur when using per-process base-bounds pairs.

4) Segmented memory can also be used to solve the problem of bad memory references in code. This attacks the problems of needing to know size in advance and of possible buffer overruns that can occur when using per-process base-bounds pairs. When using segmented memory, a segmentation table is associated with each process. Each segment is associated with a base-bounds pair.

Segmented Memory Diagram

There are several problems associated with using segmented memory. Segments can be exhausted (e.g., this doesn't work if we have 100's of libraries), and there is still external fragmentation.

Virtual Memory

5) The best solution is to use virtual memory. Virtual memory can be thought of as a simpler form of segmentation in which all segments are the same size. In x86, size is 4096 bytes/page. A con of using virtual memory is that there is internal fragmentation of up to 4095 bytes. However, there is no external fragmentation.

In a simple, one-level page table, virtual addresses are 32 bits wide. The first 20 bits hold the virtual page number, and the last 12 contain the offset in the physical location of the page. Each process has a page table that contains the physical address of the pages (aligned on the page boundary) and extra information about each page, such as permissions. The page table is 32 bits wide, with the first 20 bits holding the physical address of the page and the last 12 containing the extra information. The virtual page number stored in the virtual address corresponds to page table index. The 0^th index of the page table corresponds to %cr3, a privileged register. Modifying the page table is a privileged instruction. There are 2²² bytes per page table, or 4MiB per process.

Using a two-level page table is similar to using indirect blocks in a file system. Entries in the level one page table point to level two page tables. In a two-level page table, virtual addresses have 10 “hi” bits, 10 “med” bits and 12 “lo” bits. The “hi” part of the virtual address refers to the index in the level one page table and the “med” part of the virtual address refers to the index in the corresponding level two page table. The “lo” part of the virtual address refers to offset within the physical location of the page.

C Code to Implement Two-Level Page Table (built into hardware)

unsigned pmap (unsigned va){

unsigned hi = va >> 22;

unsigned med = (va >> 12) & ((1 << 10) – 1)

unsigned *l1pt = asm(“%cr3”);

unsigned *l2pt = l1pt[hi] &~ ((1 << 12) – 1);

if (l2pt == FAULT)

return FAULT;

return l2pt[med] &~ ((1<<12) – 1);

}

Page Faulting

Hardware traps when given a “bad” address. If an address is bad, it has no physical memory in the page table. When the kernel takes over, it can terminate the process (i.e., dump core), raise the SEGV signal or create a page table entry with real memory (returning from interrupt). Most commonly, the kernel creates a page table entry.

Physical memory is a cache of swap space. Swap space is disk space that is used for virtual memory.

Swap Space Diagram

If an entry is 0 in the page table, it is not in physical memory and must be retrieved from the swap space.

C Function for Retrieving a Page from Disk

unsigned long long swapmap (unsigned va) {

return swapbase + (va >> 12); //swapbase is per process

}

When a large array is malloc'd, page table entries are allocated. Some of the page table entries can work, and some can be empty. On a system with over-allocated memory, more virtual memory can be allocated than swap space. If this extra virtual memory is used, the kernel kills you off.

The expression “Free memory is wasted memory” refers to the idea that if you have free physical memory that is not in use, you screwed up.