Untitled document

Lecture 17

Policy	Mechanism
abstract	nuts & bolts
algorithm	how to implement
priorities	machine specific, e.g. %cr3
strategy	changes over time
generally stays the same

Page Faulting Mechanism

Swap Space

Kernal’s swap space Your Process

| | | | | | | | | |

Your Process

Pages RAM(assume no free memory)

| | | | | | | | | | | | | | |

^ choose a victim v to evict from RAM

Page Fault Code

Arguments and variables:

va = virtual address

v = victim

vs = victim swap address

atype = access type

p = currently running process

void pfault(int va, int atype, process_t p)

{

if(swapmap(va, p) == 0)

{

kill_process(p);

}

else //lots of hand waving in this section

{

op, ova = page_replacement_policy(...); // v = op->pmap(ova),

// vs = swapmap(ova, op)

write v to vs;

op->pmap(ova) = 0;

read v from swapmap(va, p);

p->pmap(pvn(va)) = v;

}

Problems with this implementation

1) What if your kernel gets big?

Can VM be used inside the kernel?

if so, we have to be careful to never choose paging code for replacement

“wired down pages” -> never get paged out

2) Can VM make your programs start up faster?

Paging Optimizations:

demand paging
dirty bit

Demand Paging

executable file _______ swap space

| 0 | |

^text(read only) virtual memory^

Page from executable when accessing read-only storage, e.g. functions, const
When a program starts, page in only the 1st page

cache pg. 0 in RAM, then jump into it
can start running after one page load

Pros and Cons:

+ program starts faster

- more page faults(during initial phase of program)

Cost:

N pages total in executable( no dynamic memory allocation)

U pages actually used

R = cost of reading a page

C = cost of page fault it self( flash: small, disk: large)

Without demand paging:

total cost = RN

With demand paging:

total cost = U(R + C)

So to determine which is better, the fraction of N that U is and the size of C must both be taken into account

Latency:

Without demand paging:

latency = RN

With demand paging:

latency = R

Dirty Bit

Extra bit in page table associated with each page in memory. It is dirty if the page has been changed and clean if it has not.

How to implement the dirty bit:

1) compare cached page to version on disk

- saves very little time

2) save checksum of page (e.g. 4 byte csum of 4 KiB page)

- checksums are not infallible

3) use a bit in the page table entry

How to set reliably?

kernal can set this, for its own access, but programs can change pages too.

1) Ask for help from hardware

- slows doen implementation of VM

2) Kernal marks all pages as read-only, enables W bit after page fault and sets dirty

bit

Efficiency of Virtual-memory using Programs

assume a large program(e.g. Apache)

wants to run some other program (standard way to let programs use Apache to provide a

service

int main()

{

while(read_request())

{

p = fork(); ------------------------------>copy of Apache’s memory

if(p > 0) -cost of fork proportional to size of memory

{ assignment to Apache

execlp(“helper”, 0);

}

Solution:

Share pages between parent & child, BUT they’re marked read-only to the hardware
when child/parent modifies such a page, kernal clones page and makes both copies writeable (assuming hardware support for dirty bit)
This speeds up fork() quite a bit, but you still must copy the page tables

still O(N)

IDEA: share page tables too!

works, but is tricky

Easier Way?

vfork(): semantics differ from fork()

parent is frozen until child either exits or exexlp
if child changes RAM, this might affect parent’s RAM

Goal: O(1)

parent and child have same page table until exec

Ways to speed up virtual memory using programs:

1) vfork

2) multithreaded (each thread can fork independently- generally don’t use vfork() in this instance

because it can undesirably freeze other threads)

2a) pool of child processes

3) event driven with asynchronous I/O

-aio_read(...), aio_write(...) -> return right away even if the data is not there yet

Virtual Memory Policy

Page Replacement Policy -> must be one that matches the application

If random access to memory:

victim can be chosen at random

Instrument Applications by getting traces

simple trace: list of virtual page numbers, omitting adjacent duplicates

Use this trace:

0 1 2 3 0 1 4 0 1 2 3 4

assume 5 virtual pages, 3 physical pages

Policy: First in First Out

cost = 9 page faults

t --->

	0	1	2	3	0	1	4	0	1	2	3	4
A	^0	0	0	^3	3	3	^4	4	4	4	4	4
B		^1	1	1	^0	0	0	0	0	^2	2	2
C			^2	2	2	^1	1	1	1	1	^3	3

Belady’s Anomaly (very rare)

-use 4 pages

cost = 10 page faults → adding more memory made it slower!

t--->

	0	1	2	3	0	1	4	0	1	2	3	4
A	^0	0	0	0	0	0	^4	4	4	4	^3	3
B		^1	1	1	1	1	1	^0	0	0	0	^4
C			^2	2	2	2	2	2	^1	1	1	1
D				^3	3	3	3	3	3	^2	2	2

Least Recently Used

cost = 10 page faults

t --->

	0	1	2	3	0	1	4	0	1	2	3	4
A	^0	0	0	^3	3	3	^4	4	4	^2	2	2
B		^1	1	1	^0	0	0	0	0	0	^3	3
C			^2	2	2	^1	1	1	1	1	1	^4

Oracle of Delphi( farthest in future)

cost = 7 page faults

t --->

	0	1	2	3	0	1	4	0	1	2	3	4
A	^0	0	0	0	0	0	0	0	0	^2	2	2
B		^1	1	1	1	1	1	1	1	1	^3	3
C			^2	^3	3	3	^4	4	4	4	4	4

Distributed Systems & Remote Procedure Calls

How procedure calls differ from ordinary calls:

caller & callee don’t share address space

+ hard modularity
- large objects expensive to communicate

no call by reference, only call by value
caller & callee may be disagree about architecture

e.g. caller: x86, callee: x86-64

Need agreement as to how to represent ints, floats, etc.

	0	1	2	3	0	1	4	0	1	2	3	4
A	^0	0	0	0	0	0	^4	4	4	4	^3	3
B		^1	1	1	1	1	1	^0	0	0	0	^4
C			^2	2	2	2	2	2	^1	1	1	1
D				^3	3	3	3	3	3	^2	2	2

	0	1	2	3	0	1	4	0	1	2	3	4
A	^0	0	0	0	0	0	^4	4	4	4	^3	3
B		^1	1	1	1	1	1	^0	0	0	0	^4
C			^2	2	2	2	2	2	^1	1	1	1
D				^3	3	3	3	3	3	^2	2	2

	0	1	2	3	0	1	4	0	1	2	3	4
A	^0	0	0	0	0	0	^4	4	4	4	^3	3
B		^1	1	1	1	1	1	^0	0	0	0	^4
C			^2	2	2	2	2	2	^1	1	1	1
D				^3	3	3	3	3	3	^2	2	2