Virtual memory was originally proposed as a way to run programs requiring large amounts of memory on a computer with a smaller amount of primary memory. However, virtual memory performance suffers quickly as the amount of memory used increases past the amount of primary memory, for the throughput and response time of virtual memory rapidly approaches those of the secondary memory device. The difference in speed between primary and secondary storage is commonly between 1,000 and 1,000,000, a large enough difference to make even simple, fast programs quickly become unusable.
Although better than the alternative of processes suddenly dying when they reach a physical memory limit, virtual memory does not solve this problem, which brings up the question of why virtual memory is so successful. This is because the benefit of virtual memory for normal applications is memory protection and abstraction using modularity. Virtual memory gives a sophisticated method to enforce modularity of memory, by ensuring that different processes cannot destroy the memory of other processes.
First, why is any memory protection necessary? Aside from security concerns of malicious code begin able to examine or manipulate memory of other programs, the main problem of allowing unguarded memory access is buggy programs. In C, there are many ways for a program to have a bad memory access, such as the following:
a[i] = ...;Out-of-range array indexing
*p = ...;If p is an uninitialized pointer, a null pointer, a freed pointer, or was cast to an improper type.
char x[10000000000];An extremely large array.
{char x[10]; p = x;} *p = ...;Following a pointer to memory that is (implicitly) deallocated.
There are several ways to protect against these kind of programming mistakes. The simplest approach is to not make mistakes, but this is not possible in practice.
One way to deal with this is to have the OS check every pointer dereference that a program makes. As the OS keeps track of memory allocated for each program, the OS can make these kind of checks, and this method would be able to capture every access to bad memory. However, the fact that every dereference involves a system call makes this method too slow for most applications.
Another possibility is to have compilers insert checks around every dereference. There are some technical difficulties of determining at compile time how to check whether a pointer is valid. For arrays whose declarations are known, we can easily check that an index is in bounds, but arbitrary pointer checks are more difficult. Therefore, this method would likely require using a programming language other than C, such as Java. In Java, there are no arbitrary pointers, and all arrays have a member storing their size, so all accesses check if an index is in bounds. However, this does come at a performance cost.
A limited version of this is available in C using the gcc -fsanitize=address option. This works by allocating extra memory, known as shadow memory, that serves as a bitmap for the memory used by your process. Each time memory is allocated or freed, the change of status of these bits of memory are recorded in shadow memory. The shadow memory is a continuous array of memory, with one byte representing the status of 8 bytes of real memory. Each time an address is dereferenced, its corresponding entry in shadow memory is checked. This comes at only a slight performance cost, but does come at a large memory cost. This will also not catch all memory errors, as it only tracks memory allocated by the process, and only in chunks of 8 bytes.
None of these successfully detects all memory errors without major performance or memory penalties. One approach that can have good performance is to associate with each process an upper and lower bound for memory it can access. To make checking a pointer efficient, this check is built into the hardware, meaning that we add two new privileged registers to store the bounds of our program's memory.
This scheme has several problems. First, it will only detect bad pointers to memory ranges outside the process's memory, and is thus free to destroy its own memory. Second, this means that the program memory must be stored contiguously in memory, complete with executable and data. This makes managing locations of process memory difficult, as we can have external fragmentation due to programs beginning and then exiting. Finally, this can make growing or moving a program's memory problematic.
To improve on this idea, and make it more flexible, we change how pointers are interpreted. We make the upper bits of a pointer signify what the general purpose of the pointer is, such as whether this points to code, the stack, or buffer I/O. The lower bits give the offset from the region of memory indicated by the upper bits. In this scheme, the region of memory for a particular type of pointer is known as a segment. Associated with each process will be a table that, for each possible segment, gives the base address and the size of the segment. Again, hardware will consult this table (stored on a cache on chip) for each pointer access to determine if the pointer is valid. An additional privileged register is required to know the location of the process table for the running process.
This idea works better, as there is not as much external fragmentation due to the sizes of blocks of memory being smaller, but there are still some downsides. The major one is there is a maximum number of segments, determined by hardware. If the upper 8 bits of the pointer are used for defining the segment, then we can only have 256 segments per program.
This leads us to virtual memory. This is similar to the segment memory management method, except that we fix the size of segments. These fixed sized segments are called pages. Room can be saved in the page table, as the size of the page does not need to be stored. Additionally, if pages locations are memory aligned to a power of 2, the lower bytes of the address must always be zero, meaning that it is unnecessary to actually store these bits. Usually, the extra bits saved are used as flags for permissions and other information about the page.
For example, x86 hardware supports a page size of 4096 bytes. To keep page tables small and to allow large number of pages, pointers are split into three pieces: the 10 upper bits define the offset in the first level page table, which contains the locations of the second level page tables. The second 10 bits are the offset in the second level table, and the final 12 bits are the offset in the page. As many processes do not need all 4 gigabytes of virtual memory possible with a 32 bit pointer, many entries in the page table are blank, and do not point to anywhere. This is similar to the setup of blocks and inodes in a filesystem.
An extra feature of virtual memory is that not all of the memory needs to be stored in primary memory. In this case, the page table entry for a missing page indicates that it is not stored in memory. When trying to access this memory, a page fault occurs, causing a hardware trap. The OS can then handle this fault, generally by fetching the page from secondary memory into primary memory, and waking up the process when this is done. Secondary storage for pages is known as the swap space. Thus, virtual memory allows primary memory to serve as a cache for the swap space.
Many operating systems let you allocate more memory than there is room in swap space. In this case, a page table entry has been created, but the actual pages have not yet been allocated. If a process tries to use this memory, and there is no more space left, then the OS kills off the process trying to use the memory.