CS111 Lecture Note: Virtual Memory
Lecture date: Nov. 20, 2013. Note taken by Wentao Shang.
History of virtual memory
- Introduced in 1960s by IBM to provide larger memory space to programs.
- A machine with 128 KiB of RAM can have 16MiB of virtual memory!
- Performance rely on the LOCALITY OF REFERENCE:
In the picture above, the program accesses virtual memory locations in block A and B most of the time.
Therefore it is sufficient to keep only these two blocks in physical memory.
However, there is still a problem of THRASHING.
- Not enough physical RAM to hold all references to virtual memory
- Frequently swapping data between physical memory and disk
- Performance hurts when paging I/O dominates
Another problem to solve: memory modularity
Bad memory access
Lots of unreliable processes with bad memory references. Typical errors include:
- Array index out of range
E.g., char a[100]; a[200] = 'a';
- Null pointer dereference
E.g., int *p = NULL; *p = 5;
- Access of free/uninitialized/random memory location
E.g., free (p); *p = 5;
- Stack allocation overflow/underflow
E.g., char a[1000000000000000]; // implemented as subl $1000000000000000, %esp
Possible solutions:
0. Don't do that! And fire the programmer who does that.
1. Turn every memory access into a system call.
- Syscall is expensive
- Need per-process table to track valid memory space
2. Let compiler insert code around every access, checking that pointer is valid.
- You can change to a "better" language, like Java, which enforces run-time checking against invalid memory access.
- Or still use C, but compile with "
gcc -fsanitize=address
".
How it works:
- Divide memory space into real memory and shadow memory
- Real memory: an array of 8-byte cells
- Shadow memory: an array of 1-byte cells
- Shadow memory describes real memory: 0 means free, 1 means allocated
- As a result, 1/9 of memory space is reserved for implementing sanitation check
- For every access to memory location p, compiler inserts code that checks whether
shadow_mem (p >> 3) == 1
3. Use per-process base-bounds pair.
- Two more columns in process table: base (L) + bounds (U)
- For every mem access a, ensure that L ≤ a < U
- Enforced by hardware with two privileged registers: rL + rU
- Split RAM into regions:
- Assumption: position-independent code (PIC). E.g., compile code with "
gcc -fPIC
"
- Problems:
- External fragmentation
- Whole program must fit in RAM at once
- Running processes can't be relocated (unless all references are relative to L)
- Need to predict size at start
- Buffer overflow inside program memory region is still possible
4. Use segmented memory.
- Interpreting a 32-bit pointer:
- High 8 bits: indicate purpose of pointer, e.g. code / stack / big IO buffer / ...; actually an index into segment table
- Low 24 bits: indicate data offset in the corresponding segment
- Associated with each process is a segment table (recording segment location + size): several KB in total, can be cached by CPU
- Process memory space can be separated into segments
- But external fragmentation still exists!
Solution: fix segment size and align all segments.
Implementation of segmented virtual memory on x86
1. Single-level page table
- Fixed-size segment (also called page): 4096 bytes
- Interpreting a 32-bit virtual address: 20 bits + 12 bits
- High 20 bits: virtual page index
- Low 12 bits: data offset in page
- Per-process page table:
- 32-bit-length entry storing physical address of the page (aligned on page boundary):
- High 20 bits: page address
- Low 12 bits: extra page info
Note: lower 12 bits of the page address are always 0 since pages are aligned on boundary. Therefore it can be used to store extra information.
- "%cr3" privileged register holds the location of page table
- Page table memory is invisible to the process: can't modify your own page table
- Internal fragmentation (up to 4095 bytes) but no external fragmentation (because all pages are aligned)
- Page table is BIG: 2^22 bytes per table, i.e., 4 MiB per process
2. Two-level page table
- Interpreting a 32-bit virtual address: 10 bits + 10 bits + 12 bits
- High 10 bits: 1st level page index
- Mid 10 bits: 2nd level page index
- Low 12 bits: data offset
- 1st level page table points to locations of 2nd level page table
- 2nd level page table points to actual page location
- Relatively small page tables: L2 table allocated as needed
Pseudo C code for two-level memory lookup
unsigned pmap (unsigned va) // va: virtual address
{
unsigned hi = va >> 22; // high order 10 bits
unsigned med = (va >> 12) & ((1 << 10) - 1);
unsigned *l1pt = asm ("%cr3", ...); // get location of l1 page table
unsigned *l2pt = l1pt[hi] &~ ((1 << 12) - 1);
if (l2pt == FAULT) return FAULT; // FAULT marks zero mem addr
return l2pt[med] &~ ((1 << 12) -1);
}
Page faulting
- Hardware traps when given a "bad" address (because there's no valid physical memory in page table)
- Then kernel takes over:
- May terminate process (dump core)
- OR raise SEGV signal
- OR (more commonly) create page table entry with real memory then RETI
"Free memory is wasted memory." -- by Linus Torvalds
Swap space on disk
- Physical memory is a "cache" of swap space
- Swap space gets updated when physical memory page is swapped to disk
- Upon page fault, try to look up the page in swap space and put it back to physical memory before RETI (may need to swap other pages to disk)
- Need a swapmap function to handle mapping from virtual address to swap space location
A naive implementation of swapmap could be (assuming linear allocation in swap space):
unsigned long long swapmap (unsigned va)
{
return swapbase + (va >> 12); // "swapbase" is per-process
}
malloc() revisited
- When allocating large memory block, malloc allocates page table entries: some can work, some can be zero
- On a system with over allocated memory, you can still "allocate" more virtual memory than swap space
- But once you try to use it, kernel will kill the process
Last modified: Fri Nov 22 13:33:49 PST 2013