The problems with segmentation was caused by the dynamic size of each segment, and the fix is done by removing the dynamic size and making the size of every segment a constant number. This is the core of page-based memory management. In page-based memory management, each processes will have its own page table, which allows extra level of indirection. Figure below shows the first type of page-based memory management, the linear page table:
The function pmap is simple, taking virtual page number as argument, and then returning the page table entry for that page number.
If each page enty is 4 bytes, and there are 2^22 entries, then each process is given up to 4 MiB of mapping. However, for processes that do not need that much memory, it is a waste of space.
When there is many small processes, 2-level page table is utilized since there will be many unused page table entries that waste a lot of RAM when using linear page table. 2-level page table adds an extra level of indirection, which are "hi" and "lo" secondary table that allows for more efficient use of memory. The figure below shows the 2-level page table:
With 2 levels, the pmap function is now more complicated.
As the OS kernel, it might want to trick the hardware and tell that the current process is smaller than it actually is in the hardware page table. This is known as page faulting. It is very slow, and it is too slow to work with virtual memory. Therefore it is uvsed with physical memory because you need to pull data into memory. When user page faults, he traps and enters the kernel, and then the kernel inspects the faulting address. If the kernel deems the page being accessed as valid, the kernel picks a victim page to "swap" with the requested page on the disk. This is referred as swap space and it can be seen as an extension of physical memory. The kernel will copy the victim page to disk and copy the initially requested page from disk to main memory.
vicproc - process holding victim page
va - virtual address of victim
pa - physical address of victim
swapmap() takes proc(process) and VPN and returns a pointer to region on hard disk that should contain the data for the requested page.
The process of page faulting is slow, but there are some ways to speed it up
1. Dirty bit - In the page table entries, a bit is desicnated to be 0 for unmodified and 1 for modified. Every now and then, the bits are reset all the bits to 0. Whenever the page is modified, the bit is changed to 1, which till let the user know whether or not he actually need to write the page back to the swap space. If the bit is 0 when the page is victimized, it is good for the user since user only have to replace it without doing a store for that page.
2. Demand paging - When the program starts up, the user will load just the first page, and then to page faults when the program needs it. The advantage is that there is a quick response time for the programs, but after that startup, it is slow, due to the need of the disk arm to move around constantly as it is serving other processes.
In the discussion of virtual memory, there was general problem with caching, and this caching problem occurs at several levels in the memory hierarchy. The memory hierarchy, ordered from fastest to slowest, is shown below:
1. Suppose user has random accesses to virtual memory. Since accesses are random, it does not matter, but this is rare case; user typically sees locality of reference.
2. Suppose user sees locality of reference. He will see different policies, two of them described below.
It is the simple one:
Suppose we have 5 virtual pages and 3 physical pages, with the access reference string of virtual pages 0 1 2 3 0 1 4 0 1 2 3 4:
Reference string: | 0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | ? | 0* | 0 | 0 | 3* | 3 | 3 | 4* | 4 | 4 | 4 | 4 | 4 |
B | ? | ? | 1* | 1 | 1 | 0* | 0 | 0 | 0 | 0 | 2* | 3* | 3 |
C | ? | ? | ? | 2* | 2 | 2 | 1* | 1 | 1 | 1 | 1 | 1 | 1 |
This gives a total of 9 page faults. Thus, 75% of our accesses were page faults, and therefore it is not optimal.
One option is to buy more memory. 2 additional pages would bring the number of page faults down to 5, one for each page of virtual memory. However, if only 1 additional page is added:
Reference string: | 0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | ? | 0* | 0 | 0 | 0 | 0 | 4* | 4 | 4 | 4 | 4 | 3* | 3 |
B | ? | ? | 1* | 1 | 1 | 1 | 1 | 1 | 0* | 0 | 0 | 0 | 4* |
C | ? | ? | ? | 2* | 2 | 2 | 2 | 2 | 2 | 1* | 1 | 1 | 1 |
D | ? | ? | ? | ? | 3* | 3 | 3 | 3 | 3 | 3 | 2* | 2 | 2 |
there will be 10 page faults, which is more than when running with less physical memory. This is rare but it can happen occasionally. This is called Belady's Anomaly, where more memory actually makes thing worse.
So buying memory is not only expensive and limited but it could make things works. Well, if there is a computer that knows the reference string ahead of time and picks the page that is not needed for the longest:
Reference string: | 0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | ? | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 |
B | ? | ? | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3* | 3 |
C | ? | ? | ? | 2* | 3* | 3 | 3 | 4* | 4 | 4 | 4 | 4 | 4 |
Now, there are 7 page faults, which is better. However, it assumes that the computer have static view of the code available to the compiler. Hence, it is not applicable to operating systems, but there is a way to approximate it.
Reference string: | 0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | ? | 0* | 0 | 0 | 3* | 3 | 3 | 4* | 4 | 4 | 2* | 2 | 2 |
B | ? | ? | 1* | 1 | 1 | 0* | 0 | 0 | 0 | 0 | 0 | 3* | 3 |
C | ? | ? | ? | 2* | 2 | 2 | 1* | 1 | 1 | 1 | 1 | 1 | 4* |
In LRU, computer keeps track of how often it has run things in the past. Above shows 10 page faults, but LRU is usually better than FIFO. This is rare exception.
When using LRU, help from hardware is necessary. Every time a user uses a page, the counter needs to go up by one, but this will slow down accesses. Therefore, the clock value will be used instead. Every time user uses a page, timestamp is put into a column in virtual memory table. In our page table:
Page Number | Clock Value |