Lecture 17
Policy | Mechanism |
abstract | nuts & bolts |
algorithm | how to implement |
priorities | machine specific, e.g. %cr3 |
strategy | changes over time |
generally stays the same |
Page Faulting Mechanism
Swap Space
Kernal’s swap space Your Process
| | | | | | | | | | |
Your Process
Pages RAM(assume no free memory)
| | | | | | | | | | | | | | | |
^ choose a victim v to evict from RAM
Page Fault Code
Arguments and variables:
va = virtual address
v = victim
vs = victim swap address
atype = access type
p = currently running process
void pfault(int va, int atype, process_t p)
{
if(swapmap(va, p) == 0)
{
kill_process(p);
}
else //lots of hand waving in this section
{
op, ova = page_replacement_policy(...); // v = op->pmap(ova),
// vs = swapmap(ova, op)
write v to vs;
op->pmap(ova) = 0;
read v from swapmap(va, p);
p->pmap(pvn(va)) = v;
}
}
Problems with this implementation
1) What if your kernel gets big?
Can VM be used inside the kernel?
if so, we have to be careful to never choose paging code for replacement
“wired down pages” -> never get paged out
2) Can VM make your programs start up faster?
Paging Optimizations:
Demand Paging
executable file _______ swap space
| 0 | | |
^text(read only) virtual memory^
Pros and Cons:
+ program starts faster
- more page faults(during initial phase of program)
Cost:
N pages total in executable( no dynamic memory allocation)
U pages actually used
R = cost of reading a page
C = cost of page fault it self( flash: small, disk: large)
Without demand paging:
total cost = RN
With demand paging:
total cost = U(R + C)
So to determine which is better, the fraction of N that U is and the size of C must both be taken into account
Latency:
Without demand paging:
latency = RN
With demand paging:
latency = R
Dirty Bit
Extra bit in page table associated with each page in memory. It is dirty if the page has been changed and clean if it has not.
How to implement the dirty bit:
1) compare cached page to version on disk
- saves very little time
2) save checksum of page (e.g. 4 byte csum of 4 KiB page)
- checksums are not infallible
3) use a bit in the page table entry
How to set reliably?
1) Ask for help from hardware
- slows doen implementation of VM
2) Kernal marks all pages as read-only, enables W bit after page fault and sets dirty
bit
Efficiency of Virtual-memory using Programs
assume a large program(e.g. Apache)
wants to run some other program (standard way to let programs use Apache to provide a
service
int main()
{
while(read_request())
{
p = fork(); ------------------------------>copy of Apache’s memory
if(p > 0) -cost of fork proportional to size of memory
{ assignment to Apache
execlp(“helper”, 0);
}
}
}
Solution:
Ways to speed up virtual memory using programs:
1) vfork
2) multithreaded (each thread can fork independently- generally don’t use vfork() in this instance
because it can undesirably freeze other threads)
2a) pool of child processes
3) event driven with asynchronous I/O
-aio_read(...), aio_write(...) -> return right away even if the data is not there yet
Virtual Memory Policy
Page Replacement Policy -> must be one that matches the application
If random access to memory:
Instrument Applications by getting traces
simple trace: list of virtual page numbers, omitting adjacent duplicates
Use this trace:
0 1 2 3 0 1 4 0 1 2 3 4
assume 5 virtual pages, 3 physical pages
Policy: First in First Out
cost = 9 page faults
t --->
0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | ^0 | 0 | 0 | ^3 | 3 | 3 | ^4 | 4 | 4 | 4 | 4 | 4 |
B | ^1 | 1 | 1 | ^0 | 0 | 0 | 0 | 0 | ^2 | 2 | 2 | |
C | ^2 | 2 | 2 | ^1 | 1 | 1 | 1 | 1 | ^3 | 3 |
Belady’s Anomaly (very rare)
-use 4 pages
cost = 10 page faults → adding more memory made it slower!
t--->
0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | ^0 | 0 | 0 | 0 | 0 | 0 | ^4 | 4 | 4 | 4 | ^3 | 3 |
B | ^1 | 1 | 1 | 1 | 1 | 1 | ^0 | 0 | 0 | 0 | ^4 | |
C | ^2 | 2 | 2 | 2 | 2 | 2 | ^1 | 1 | 1 | 1 | ||
D | ^3 | 3 | 3 | 3 | 3 | 3 | ^2 | 2 | 2 |
Least Recently Used
cost = 10 page faults
t --->
0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | ^0 | 0 | 0 | ^3 | 3 | 3 | ^4 | 4 | 4 | ^2 | 2 | 2 |
B | ^1 | 1 | 1 | ^0 | 0 | 0 | 0 | 0 | 0 | ^3 | 3 | |
C | ^2 | 2 | 2 | ^1 | 1 | 1 | 1 | 1 | 1 | ^4 |
Oracle of Delphi( farthest in future)
cost = 7 page faults
t --->
0 | 1 | 2 | 3 | 0 | 1 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | ^0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ^2 | 2 | 2 |
B | ^1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ^3 | 3 | |
C | ^2 | ^3 | 3 | 3 | ^4 | 4 | 4 | 4 | 4 | 4 |
Distributed Systems & Remote Procedure Calls
How procedure calls differ from ordinary calls:
Need agreement as to how to represent ints, floats, etc.