CS111 Lecture Note: Virtual Memory

Lecture date: Nov. 20, 2013. Note taken by Wentao Shang.

History of virtual memory

Introduced in 1960s by IBM to provide larger memory space to programs.
A machine with 128 KiB of RAM can have 16MiB of virtual memory!
Performance rely on the LOCALITY OF REFERENCE:

In the picture above, the program accesses virtual memory locations in block A and B most of the time. Therefore it is sufficient to keep only these two blocks in physical memory.

However, there is still a problem of THRASHING.

Not enough physical RAM to hold all references to virtual memory
Frequently swapping data between physical memory and disk
Performance hurts when paging I/O dominates

Another problem to solve: memory modularity

Bad memory access

Lots of unreliable processes with bad memory references. Typical errors include:

Array index out of range
E.g., char a[100]; a[200] = 'a';
Null pointer dereference
E.g., int *p = NULL; *p = 5;
Access of free/uninitialized/random memory location
E.g., free (p); *p = 5;
Stack allocation overflow/underflow
E.g., char a[1000000000000000]; // implemented as subl $1000000000000000, %esp

Possible solutions:

0. Don't do that! And fire the programmer who does that.

1. Turn every memory access into a system call.

Syscall is expensive
Need per-process table to track valid memory space

2. Let compiler insert code around every access, checking that pointer is valid.

You can change to a "better" language, like Java, which enforces run-time checking against invalid memory access.
Or still use C, but compile with "gcc -fsanitize=address".
How it works:
- Divide memory space into real memory and shadow memory
- Real memory: an array of 8-byte cells
- Shadow memory: an array of 1-byte cells
- Shadow memory describes real memory: 0 means free, 1 means allocated
- As a result, 1/9 of memory space is reserved for implementing sanitation check
- For every access to memory location p, compiler inserts code that checks whether shadow_mem (p >> 3) == 1

3. Use per-process base-bounds pair.

Two more columns in process table: base (L) + bounds (U)
For every mem access a, ensure that L ≤ a < U
Enforced by hardware with two privileged registers: rL + rU
Split RAM into regions:
Assumption: position-independent code (PIC). E.g., compile code with "gcc -fPIC"
Problems:
- External fragmentation
- Whole program must fit in RAM at once
- Running processes can't be relocated (unless all references are relative to L)
- Need to predict size at start
- Buffer overflow inside program memory region is still possible

4. Use segmented memory.

Interpreting a 32-bit pointer:
- High 8 bits: indicate purpose of pointer, e.g. code / stack / big IO buffer / ...; actually an index into segment table
- Low 24 bits: indicate data offset in the corresponding segment
Associated with each process is a segment table (recording segment location + size): several KB in total, can be cached by CPU
Process memory space can be separated into segments
But external fragmentation still exists!
Solution: fix segment size and align all segments.

Implementation of segmented virtual memory on x86

1. Single-level page table

Fixed-size segment (also called page): 4096 bytes
Interpreting a 32-bit virtual address: 20 bits + 12 bits
- High 20 bits: virtual page index
- Low 12 bits: data offset in page
Per-process page table:
- 32-bit-length entry storing physical address of the page (aligned on page boundary):
  - High 20 bits: page address
  - Low 12 bits: extra page info
    Note: lower 12 bits of the page address are always 0 since pages are aligned on boundary. Therefore it can be used to store extra information.
- "%cr3" privileged register holds the location of page table
- Page table memory is invisible to the process: can't modify your own page table
- Internal fragmentation (up to 4095 bytes) but no external fragmentation (because all pages are aligned)
- Page table is BIG: 2^22 bytes per table, i.e., 4 MiB per process

2. Two-level page table

Interpreting a 32-bit virtual address: 10 bits + 10 bits + 12 bits
- High 10 bits: 1st level page index
- Mid 10 bits: 2nd level page index
- Low 12 bits: data offset
1st level page table points to locations of 2nd level page table
2nd level page table points to actual page location
Relatively small page tables: L2 table allocated as needed

Pseudo C code for two-level memory lookup


unsigned pmap (unsigned va)  // va: virtual address

{

  unsigned hi = va >> 22;  // high order 10 bits

  unsigned med = (va >> 12) & ((1 << 10) - 1);

  unsigned *l1pt = asm ("%cr3", ...);  // get location of l1 page table

  unsigned *l2pt = l1pt[hi] &~ ((1 << 12) - 1);

  if (l2pt == FAULT) return FAULT; // FAULT marks zero mem addr

  return l2pt[med] &~ ((1 << 12) -1);

}

Page faulting

Hardware traps when given a "bad" address (because there's no valid physical memory in page table)
Then kernel takes over:
- May terminate process (dump core)
- OR raise SEGV signal
- OR (more commonly) create page table entry with real memory then RETI

"Free memory is wasted memory." -- by Linus Torvalds

Swap space on disk

Physical memory is a "cache" of swap space
Swap space gets updated when physical memory page is swapped to disk
Upon page fault, try to look up the page in swap space and put it back to physical memory before RETI (may need to swap other pages to disk)
Need a swapmap function to handle mapping from virtual address to swap space location
A naive implementation of swapmap could be (assuming linear allocation in swap space):
unsigned long long swapmap (unsigned va) { return swapbase + (va >> 12); // "swapbase" is per-process }

malloc() revisited

When allocating large memory block, malloc allocates page table entries: some can work, some can be zero
On a system with over allocated memory, you can still "allocate" more virtual memory than swap space
But once you try to use it, kernel will kill the process

Last modified: Fri Nov 22 13:33:49 PST 2013