Lecture 15. 5/24/2012
Nicholas Solichin
Virtual Memory and Processes
Table of Contents
Multi-level Page Tables
The process virtual space consists of a read-only text segment and a data segment with the heap and the stack growing from opposite ends of the data segment, leaving a wide gap in between. This leaves the page table with a lot of wasted, unused entries.
Solution. Split the page table into multiple levels like sparse files in the UNIX filesystem. Take the following code:
open("foo", O_RDWR | O_CREAT | O_TRUNC, 0600);
lseek(fd, 1000000000, SEEK_CUR);
write(fd, "x", 1);
Recall that only the last data block containing "x" is allocated. The surrounding entries in the indirect blocks are left as 0s. Similarly, the first-level page table has 0s in its entries and do not allocate their second-level tables until needed (see Figure).
x86 architecture supports 2-level page table virtual addresses like so:
10 bits | 10 bits | 12 bits |
Virtual Page No. | Offset |
The first 10 bits of the virtual page number (vpn) address the second-level tables from the first-level table, while the remaining 10 bits address the actual page. The following code demonstrates the hardware implementation:
size_t pmap(int vpn) {
int hi = vpn >> 10;
int lo = vpn & ((1 << 10) - 1);
int * lopage = PAGETABLE[hi];
if (!lopage)
return FAULT;
return PAGETABLE[lo];
}
Page Faults
A page fault occurs when the requested page could not be returned. The hardware traps and gives control to the kernel, which can respond in the following ways:
- Terminate the process
- Send signal (SIGSEGV) to the process
- Load page from disk (only if the fault is due to an unloaded page)
Signal Handling Case Study 1
int i; // global variable
...
signal(SIGSEGV, sighandler);
a[i]; // invalid access causing a page fault
...
void sighandler(int sig) {
i = 27; // assume 27 is a valid subscript
}
This code works only if the compiler does not cache i
in the register when
referencing a[i]
.
Solution. Make i
volatile to prevent caching.
Signal Handling Case Study 2
Most people don't write code like the above, but the following example comes from the Bourne shell.
int * stack;
int stacksize;
stacksize = 8196;
stack = malloc_special(stacksize); // Bourne's version of malloc
...
stack[stacksize++] = x; // can segfault
...
void fixstack(int sig) {
realloc_using_mmap(stack, stacksize *= 2);
}
For convenience and performance reasons,
Bourne prefers using stack[stacksize++]
without bound checking.
This is done by using the signal handler fixstack
to automatically resize the
stack when a segmentation fault occurs. Because signal handling SIGSEGV can mask invalid
pointer bugs, this method demands perfect programming and is not for the faint of heart.
Aside.
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
mmap()
is a function that alters the page table in a controlled way by mapping files or devices (usually/dev/zero
) into memory. The block of memory fromaddr
toaddr+length
will correspond to the contents of the file specified byfd
.prot
specifies the rwx protection.
Kernel Fault Handling
The kernel keeps track of the disk address and memory size for each process. On page fault, the kernel may evict a page to make room for the new page. The removal policy (elaborated later) determines which page should be evicted. The following code shows what the kernel roughly does:
void pfault(int va /* virtual addr */,
int access_type /* rwx */,
proc cp /* current process */) {
if (swapmap(cp, va) == FAIL)
kill(cp); // cp doesn't have access to page.
else { // page not in RAM, must choose victim to evict
(vp, vva) = removal_policy();
physaddr = vp->pmap(vva);
write page at physaddr to disk at swapmap(vp, vva)
vp->pmap(vva) = FAULT;
read page at physaddr from disk at swapmap(cp, va)
p->pmap(va) = physaddr;
}
}
Caveat: The kernel code that is handling virtual memory must never
be paged out. If pfault
is paged out, we can no longer swap pages.
Removal Policy
Due to physical constraints, pages will eventually be swapped with new ones. Which pages to be swapped is determined by the removal policy. If page access is random, any page can be evicted equally, but we can typically exploit locality of reference to minimize the number of faults, improving performance. Here, we explore several page swapping policies on the reference string "012301401234". Assume we have 3 physical pages (A-C) with the 5 physical pages (0-4).
First-In First-Out (FIFO)
FIFO is a simple and intuitive algorithm in which we evict the oldest page first.
0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | 0 | 0 | 0 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 |
B | ? | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 |
C | ? | ? | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 3 | 3 |
faults | X | X | X | X | X | X | X | X | X |
0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | 4 | 4 | 3 | 3 |
B | ? | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 4 |
C | ? | ? | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |
D | ? | ? | ? | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 |
faults | X | X | X | X | X | X | X | X | X | X |
As a side note, we see that the performance of 4 physical pages here is actually worse than that of 3 physical pages, showing that adding memory doesn't always increase performance (aka Bélády's Anomaly).
Least Recently Used (LRU)
LRU is the policy typically implemented in systems. We assume that the least active page is no longer needed and evict it.
0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | 0 | 0 | 0 | 3 | 3 | 3 | 4 | 4 | 4 | 2 | 2 | 2 |
B | ? | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 3 |
C | ? | ? | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 4 |
faults | X | X | X | X | X | X | X | X | X | X |
While LRU's performance is usually better than FIFO, this particular reference string shows that it isn't always the case.
Least Needed First (LNF)
LNF evicts the page needed farthest into the future. This algorithm is provably the most efficient page replacement policy but requires an oracle to predict which page is least needed; however, we can use statistics from previous runs to approximate this algorithm.
0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 |
B | ? | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 3 |
C | ? | ? | 2 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 |
faults | X | X | X | X | X | X | X |
Read more about page replacement algorithms.
Shrinking Wait Time with VM
To run a program, we usually- copy the program file into RAM (long time),
- and set the ip to the start of file.
- mmap from file to virtual memory (initializing page table without copying the pages)
- set ip to start of file (will page fault)
Optimization: Dirty Bit
The dirty bit records whether a page has been modified. Unmodified pages do not have to be written to disk, saving I/O time. Hardware usually implements the dirty bit for you, but we can implement it in software like so:- Load page in read-only mode
- Attempt to write, causing a trap and yielding to kernel
- Have kernel modify permissions and set dirty bit