By Seena Burns
Previously, we saw that processes can have bad memory references. While there are other solutions like base and bounds pairs, these solutions lead to fragmentation, a desire for the ability to share memory amongst processes and a need for no absolute memory references. Virtual addresses are designed to respond to these issues.
A virtual address is split into two components: virtual page number (VPN) and offset (within the physical page). Page tables store an array of addresses that point to the real, physical location of a page. The VPN is used to find the index within the page table. The value at this index in the page table points to the physical page, and the offset is used to move within the physical page.
Page tables can be incredibly excessive. Each process has a separate page table, with many of the entries nulled out (for areas like the forbidden zone between heap and stack or areas outside of the process's access). The memory needed to store the page tables for all processes adds up and can quickily become a non-negligible fraction of available RAM.
Two level page tables address this issue like the approach of indirect blocks in file systems. The virtual address contains a hi and a lo virtual page number. The hi address indexes the first page table, which points to various second level page tables. The lo address indexes the respective second level page table, which points to the actual physical memory.
For very small processes, with most entries in the page table pointing to 0, the second level page tables do not have to be allocated (or only a few number), thus reducing the amount of memory needed for page tables.
A virtual address is now split into three components: hi vpn, lo vpn and offset. Hi indexes the first page table, lo indexes the second. Note: it makes most sense for the second-level page table to be the size of a physical page (4 KiB). This means, a 32 bit wide, 4KiB size page table can hold 1024 (2^10) entries, and thus needs 10 bits to index.
A C function to model what the hardware does with two level page tables looks like this:
// Map vpn (first 20 bits of virtual address, hi and lo indexes) to page // Done by hardware in reality! int pmap (int vpn) { int hi = vpn >> 10; int lo = vpn & ((1 << 10) - 1); int *page2 = PAGE_TABLE[hi]; // %cr3 if (page2 == FAULT) return FAULT return page2[lo]; // Will return FAULT if page2 faults }
Virtual memory lives on disk. For example, the swap partition used by Linux is the storage place for memory used by processes. while a process is running, its memory is paged in and cached in RAM. So if a system has 1 GB of RAM and 10 GB of swap space, there is a total of 11 GB of virtual memory available for use by processes. (Note: if changes are made, memory in RAM overrides disk)
Page faults occur when accessing a page address fails for whatever reason (e.g. page not in RAM). When a page fault occurs, the hardware traps, giving control to the kernel. The kernel can do whatever it wants: terminate, send a SIGSEGV and so forth. It can also consult its own list of pages and decide if it should swap in a page from disk.
A C function for swapping in a page from disk to RAM might look like the following. Since the RAM is almost always full, the kernel must choose a victim page to remove from RAM before swapping in the desired page from disk.
// swapmap is a function supplied by the kernel // For a process, return the disk address for a vpn (or FAIL) long long swapmap(proc, vpn) { return diskaddress or FAIL; } // pfault is executed on behalf of the process when a page fault occurs void pfault(proc, vpn) { // Get desired page's disk address long long da = swapmap(proc, vpn); if(da == FAIL) kill(proc); else { // Select a victim page using kernel's policy (vicproc, vicpn) = page_removal_policy(); // Get page address in RAM pa = vicproc->pmap(vicpn); // Write RAM data to appropriate disk address for vicproc write from pa to swapmap(vicproc, vpn) for 4KiB vicproc->pmap(vicpn) = FAULT; // Page in from disk address to RAM address read into pa from da for 4KiB proc->pmap(vpn) = pa } }
After executing pfault, the kernel uses RETI to return to exit the trap and re-runs the page accessing command, now with the desired page swapped in.
What can go wrong with this model? Thrashing: when a high fraction of instructions are swapping in pages. The system will become unusably slow. To avoid this, you want locality of reference* and small processes.
Virtual memory is marketed as being able to have 40GB of memory with only 4GB of RAM, but this is not the case. It allows a system to continue functioning after exceeding RAM by gracefully slowing to a halt, but 40GB is a lie because of thrashing.
Another issue is if the kernel itself becomes too large. The kernel can page swap itself, it just has to be careful to not swap memory management code out.
Don't write out unmodified pages.
PFAULT uses two I/O operations, so by removing the write step, we're removing essentially 50% of the overhead. This is achieved using a 'dirty bit'. Hardware keeps could maintain, but kernel uses a trick to regain control when a page is modified. The hardware keeps track of read/write permissions on the page, so the kernel defaults all pages to 'r -' (read, no write). When a process writes to the page, a trap occurs and the kernel can gain control, update its own dirty bits and then grant write permissions to the process for that page.
Demand Paging
Rather than load all pages into RAM for a new program, just read the first page and page fault in necessary pages. This approach trades start up time for run time performance (amortizes loading) and can win if only a few pages are needed. However, it can increase overall cost by removing batched page reads.
A page replacement policy can also improve performance by taking advantage of locality of reference to reduce the number of page faults in the future. Some examples are shown below, compared with 5 virtual pages (0, 1, 2, 3, 4).
FIFO: First In, First Out
The first page is chosen as a victim.
FIFO with 3 physical pages (A, B, C) has 9 page faults.
FIFO with 4 physical pages (A, B, C, D) has 10 page faults. This is known as Belady's anomaly: adding memory sometimes makes slower. This is very rare in practice, but it highlights the importance of a good page replacement policy.
Last Recently Used (LRU)
An ideal replacement policy would remove the page next used farthest in the future. If this were known with the reference string used for FIFO, only 7 page faults would needed. However, knowing the future is rare so there needs to be a heuristic to approximate the future.
LRU approximates the future by using the page used farthest in the past. This can be implemented like tracking modifications. Now the read permission bit is off by default and the kernel can gain control and timestamp whenever a page is accessed. In practice LRU is often better than FIFO, though with the reference string, FIFO works better.
Apache makes use of forks to handle incoming network requests. However, in just spawning the process for fork to run, Apache can block because it has to copy all the memory. Instead, virtual memory can be used to share physical pages between processes. Now instead of copying the memory, Apache uses the same physical memory for spawned processes.