CS 111 Lecture 15 Scribe Notes (Winter 2012)

By Billy Cao and Emily Cheung for a lecture given by Professor Paul Eggert on March 7, 2012

Page Faulting
Policy
Improving Performance
Networking

Page Faulting

A page fault occurs when a program is accessing memory that's not there. We will continue to look at pmaps (from last lecture), which maps virtual addresses to real addresses. Each process gets its own page table, which is necessary for isolation, but threads in a process have just one page table, as threads share memory.

p->pmap(va) // va is the virtual address
p->swapmap(va) {
	// This takes a virtual address and swaps it with a disk address
	return (va > p->valim ? FAULT: p->diskorigin + va);
}

pfault(va, p, ....) {
	if (p->swapmap(va) == FAULT)
		kill(p);
	else {
		(op, ova) = removal_policy();	// Removal policy moves free memory in cache to disk
		pa = op->pmap(ova);
		write pa back to disk at op->swapmap(ova)
		p->map(ova) != FAULT;
		read pa back from disk at p->swapmap(va)
		p->pmap(va) = pa;
	}
}

Booting can be faster if you record initial pages of previous boots and prefetch them, known as speculation prefetching

Policy

Question: Which page to choose as a victim?

Random
Pick a victim at random. This will work if accessing memory is truly random, but we know that memory is accessed based on locality of reference.
FIFO (First in First Out)
In this method, the page that has been loaded in RAM the longest is replaced. Thus, the pages that are first loaded into RAM are the first pages to be loaded out of RAM. This works because you have traces from real systems. These traces can be sequence of memory accesses or sequence of accessed virtual page numbers (VPN).

To test this policy, first we assume that it is a small machine and there are 3 physical pages of space on RAM, named "A, B, C". We use our example sequence to test our FIFO policy, where all bolded numbers are page faults:

0 1 2 3 0 1 4 0 1 2 3 4

A 0 0 0 3 3 3 4 4 4 4 4 4

B 1 1 1 0 0 0 0 0 2 2 2

C 2 2 2 1 1 1 1 1 3 3

The cost of this policy is 9 page faults. If we had 5 physical pages, using the same FIFO policy would cost 5 page faults as all 5 different pages (0, 1, 2, 3, 4) can all fit into RAM at once. If we had 4 physical pages, this policy would cost 10 page faults. This is counterintuitive since one would think that adding more RAM would lead to better performance whereas in this case, adding more RAM is actually leading to a poorer performance. This rare case is known as Belady’s Anomaly, where adding more cache can slow you down.
Oracle Policy

The oracle policy replaces the page that won't be needed for the longest time. This is accomplished by looking into the future to decide which pages we will need and which pages should be replaced. This is the most optimal page replacement policy. We use our example sequence to test this policy, where all bolded numbers are page faults:

0 1 2 3 0 1 4 0 1 2 3 4

A 0 0 0 0 0 0 0 0 0 2 2 2

B 1 1 1 1 1 1 1 1 1 3 3

C 2 3 3 4 4 4 4 4 4 4

The cost of this policy is 7 page faults. Given this reference string, this is the lowest cost we can accomplish given 3 physical pages. An issue with this policy is that most systems aren't equipped with an oracle that can look into the future. How can we implement this? One option, but unlikely, is to implement it through hardware. Another option is to run the program once, collect the reference string, and use the reference string as our oracle when we run the program again. Yet, while oracles are sometimes available, often times, they are not. This option will add some delay but is worth it.
Least Recently Used

The least recently used policy picks the victim based on the page that has not been used the most recently. This policy is the "nicest" in practice. We use our example sequence to test this policy, where all bolded numbers are page faults:

0 1 2 3 0 1 4 0 1 2 3 4

A 0 0 0 3 3 3 4 4 4 2 2 2

B 1 1 1 0 0 0 0 0 0 3 3

C 2 2 2 1 1 1 1 1 1 4

The cost of this policy is 10 page faults, which is even worse than FIFO. However, this reference string is a specific case used to represent Belady's Anomaly. In most cases, the least recently used policy works better than FIFO.

How can we implement this? We can use a one level page table, where the virtual page number (VPN) is used as the index and a physical page number is listed as the index entry. We would place a time stamp or a page fault counter at the end of each entry. This counter indicates which pages were last used. What happens when the page fault counter overflows? We should renormalize all the numbers. This however, slows us down. Most people implement an approximation to this policy, where the counter is only updated when a page fault occurs (invalidate page faults can clock interrupts).

Improving Performance

Demand Paging

Contrary to ordinary paging, where to start, a program is read into RAM and then executed, demand paging doesn't load a page until the program has requested for it.

Advantages

Start faster (put up a splash screen)

Disadvantages

More page fault code executed
More disk-arm movement in a multiprocess application

Demand paging is better for a faster initial response of the program as the user does not have to wait and can work in parallel with the program, rather than wait for it to finish loading before the user can start working again.

Dirty Bit

Don't write a victim page to disk if it's never been written to. Suppose a victim page has not been modified while it has been in a physical page. We do not have to write the page to disk upon replacing it. How do we know whether a page has been modified? This marker is called a dirty bit. Each page has a dirty bit which indicates whether a page has been used. This bit is 0 if the page is clean, or this bit is 1 if the page has been modified.

How can we implement this? The dirty bit is often implemented in hardware as it makes the performance fast. If it is not implemented in hardware, then another option is to mark all pages as read-only at first. On the first page fault, update the permissions to read-write (RW) and record the dirty bit implicitly.

vfork()

Let's make fork() fast. fork() copies all of the process's virtual memory. How do we speed this up?

We don't need to copy read only areas.
We don't need to copy writable pages either. Copy-on-write is to fork() as demand paging is to exec().

vfork() behaves like fork() except that:

The parent and child share the same memory and page tables rather than creating separate memory and page tables for each child.
The parent is frozen until the child exits or executes.

These differences allow vfork() to run without cloning memory. This is a performance improvement at the expense of a frozen parent. If your code is similar to the following example, then vfork() is faster than fork(). As long as you don't write to global RAM, vfork() is faster.

p = vfork();
if(!p) {
	fiddle a bit
	exec();
	exit();
}

Networking

Distrubted systems pass messages as a means of communication. Computer A will send a request to Computer B in order for Computer A to get work done, also known as a request-response pair. One of the most common way of abstracting requests/responses is by using Remote Procedure Calls (RPC).

How RPCs Differ From Ordinary Function Calls and Syscalls

Advantages

Hard modularity (Computers don't share address space)

Disadvantages

Slower yet (Don't use for something being called a lot)
Caller cannot call by reference; it always calls by value. The caller does not have access to the callees' memory since address space isn't shared.

Both Advantageous and Disadvantageous

Caller/callee don't need to agree about architecture. This requires the data format to be known. Caller must marshall the data, and the callee must unmarshall the data, where marshalling is the process of transforming the memory representation of an object to a more suitable data format. This is perfectly normal.

Table of Contents

Advantages

Disadvantages

How RPCs Differ From Ordinary Function Calls and Syscalls

Advantages

Disadvantages

Both Advantageous and Disadvantageous