CS111 Operating Systems (Winter 2012)

Lecture 15: Virtual Memory and Distributed Systems

Prepared by Kevin Calderone

Table of Contents



Virtual Memory



Page Faults


A page fault occurs when a program tries to access a virtual memory address that is currently not in RAM. This means the entry of this address in the page table currently isn't mapped to RAM.

RAM can be thought of as a cache for a much larger disk where the memory of the program actually resides. A page fault means that the data may be on disk, but it is not currently cached in RAM. The mapping from virtual addresses to actual memory is stored in the page table of the process.

The page table maps virtual addresses to physical memory addresses.

pmap(virtual address) -> memory address

Each process has its own page table in the kernel. Having a separate page table for each process is necessary for process isolation as they should not be allowed to stomp on each others memory.

Since each process has a different page table, there is not one pmap that will work for every process. Every process will have its own pmap function, which will allow each process to use a different page table and possibly even page tables of different formats.

Question: Do threads have their own page table? No, all threads of a process share the same page table. The point of having threads is to be able to have multiple tasks operating on the same memory. Threads are supposed to be fast, so there would be a lot of overhead if the entire page table had to be copied whenever a thread was created. We also want context switching to be fast, so we want to avoid having to switch between page tables when a different thread starts to execute.

If there is not enough room in RAM the virtual pages will be stored on disk. To be able to handle a page fault we need to be able to tell where on disk the page is currently being stored.

pswapmap(virtual address) -> disk address

A simple implementation of the pswapmap function may look like:

pswapmap (va) {
	return (va > p->va_limit) ? FAULT : p->disk_origin + va;
}       

The kernel would then be able to handle a page fault using these functions. If there is RAM not currently being used the kernel can simply load the requested page into the free area of RAM. Normally when a page fault happens however RAM is already full. To be able to load in the new page the kernel will have to find a victim page to send back to disk.

pfault (va, p) {
	// address out of bounds
	if (p->swapmap(va) == FAULT)
		kill(p);					
		
	// find a victim page to evict
	[op, ova] = removal_policy();		

	// find the memory address of the victim page
	pa = op->pmap(ova);					

	// write the victim page to disk
	write from pa to disk @ op->pswapmap(ova);

	// record that the victim page is no longer in RAM
	op->swapmap(ova) = FAULT;

	// read the new page into RAM from disk
	read into pa from disk @ p->swapmap(va);

	// record that the new page is now in RAM
	p->pmap(va) = pa;
}		

The victim page it chooses should be one that is not needed at the moment. The details of how this page is chosen is determined by the removal policy.


Removal Policy


Some different policies for choosing the victim page:

Pick a page at random

This would be a good solution if memory access was truly random. In general this is not the case. Due to locality of reference, memory is often accessed in the same area that it was previously accessed.

First in, first out

In this policy the page that has been in RAM the longest will be removed. This will be a better solution as it will take advantage of locality of reference.

Oracle

This policy would remove the page that won't be needed for the longest amount of time. This policy would be very efficient, but it is difficult to know what pages will be needed in the future. If you are able to predict the future, this would be a good removal policy.

Least recently used

The page that was least recently used would be chosen as the victim page. This is the nicest solution to use in practice, but it requires the system to store a timestamp for each page to keep track of the last time they were each accessed.

One solution would be to store this information in the page table, where each entry would now also contain a timestamp field. This would not be a very efficient solution as it would require the page table to be a lot larger. It would also slow down memory accesses as it would require you to write a timestamp into the page table every time you access memory.

Due to the issues with storing the timestamps, we need an approximation of this process. We could implement an approximation by taking advantage of clock interrupts. The clock interrupts pass the control back to the kernel after a certain time interval. What we could to is invalidate the entire page table on each interrupt. This would allow the kernel to see which pages are being used since they would now all cause a fault. The kernel would restore the page table after it is done using this method to sample memory accesses.


Comparison of Removal Policies


To test which removal policy will perform the best, you need to test them on an actual machine. This is done by taking traces from real systems, which is a log of all memory accesses a program performs over the sampling interval measured. This would be stored as a sequence of page indexes that these memory accesses touched. Using this log you can simulate the behavior of the page table using many different removal policies. The best policy will be the one that minimizes page swaps.

Consider the following sample trace. This is a sequence of virtual page numbers accessed by the system.

0 1 2 3 0 1 4 0 1 2 3 4

Suppose we are on a system that only has enough RAM to store 3 pages. Lets call these three physical pages A, B, and C. We will compare the behavior of the different removal policies on this sample trace.

In the following examples, the bold cells represent pages that were modified due to a page fault. The policies will be compared by counting the total number of page faults that occurred in the simulation of this trace.

First in, first out

  0 1 2 3 0 1 4 0 1 2 3 4
A 0 0 0 3 3 3 4 4 4 4 4 4
B - 1 1 1 0 0 0 0 0 2 2 2
C - - 2 2 2 1 1 1 1 1 3 3

Total page faults: 9

We would like to reduce the number of page faults. Lets try to increase the amount of RAM. We will now simulate this same trace using the same removal policy, but using four physical pages of RAM.

First in, first out with more RAM

  0 1 2 3 0 1 4 0 1 2 3 4
A 0 0 0 0 0 0 4 4 4 4 3 3
B - 1 1 1 1 1 1 0 0 0 0 4
C - - 2 2 2 2 2 2 1 1 1 1
D - - - 3 3 3 3 3 3 2 2 2

Total page faults: 10

In this example, adding more RAM to the system made the performance worse! This phenomena is known as Bélády's anomaly. Luckily this rarely occurs in practice so it is not something you usually worry about.

Lets try some of our other removal policies.

Oracle

  0 1 2 3 0 1 4 0 1 2 3 4
A 0 0 0 0 0 0 0 0 0 2 3 3
B - 1 1 1 1 1 1 1 1 1 1 1
C - - 2 3 3 3 4 4 4 4 4 4

Total page faults: 7

Least recently used

  0 1 2 3 0 1 4 0 1 2 3 4
A 0 0 0 3 3 3 4 4 4 2 2 2
B - 1 1 1 0 0 0 0 0 0 3 3
C - - 2 2 2 1 1 1 1 1 1 4

Total page faults: 10


Improving Performance of Virtual Memory


Lets consider some different ways to improve the performance of virtual memory.

Demand Paging

When a process starts, only read in the page that contains the entry point of the program. The rest of the program's memory will be paged in as they are needed.

The program will start faster since it only needs to read in one page before it can start executing. The program will be able to put up a splash screen and load in the rest of the memory it needs. This will make the program feel more responsive to the user.

However, this change would make more page faults occur while the program is executing. It would also cause more disk arm movement if there are multiple processes running on the machine using demand paging. This would make all of the page faults a lot slower.

Throw away the victim page if it was not modified

If the victim page that you are paging out hasn't been changed since it was read in from disk, there is no need to spend the time writing it back to disk. To implement this the page table would need to keep track of which pages have been modified. These dirty bits are often supported directly by the hardware. If they are not, the kernel will have to keep track of them itself.

If the kernel has to keep track of dirty bits, it can do so by trapping on the first write to a page. It can mark new pages invalid for write when they are initially loaded so that they will trap if the user tries to write to them. When the program faults by trying to write to the page, the kernel will set the dirty bit for that page and will then set the write permission to the page back on.


Virtual Memory Security


Memory pages can also be marked as if they are executable or not. This is a relatively recent addition to x86 which is meant to help protect against buffer overflow attacks. This will prevent a hacker from injecting their own executable code into the program since all of the data pages will be marked as not executable. On older hardware, any memory that was readable was also executable, so they would not be protected from buffer overflow attacks.

Having executable flags on pages still doesn't prevent all buffer overflow attacks. Systems with executable flags on virtual memory are still vulnerable to buffer overflow attacks that utilize return oriented programming. In this method, the hacker looks at all of the operations right before RET instructions in the code. By using a buffer overflow to push a sequence of return addresses onto the stack, the hacker can make the program execute little pieces of code at a time by making the program jump to positions in the code right before these RET instructions. If the program is large enough, there is often enough RET instructions in the program to build a turing complete set of operations from these fragments, so the hacker will still be able to execute arbitrary code.


Improving Performance of Fork


When you fork a process it copies all of the virtual memory of the process into the new one. This would be a very slow operation especially if there is a lot of memory used by the process.

When forking it doesn't need to copy the read only areas of memory. Both of the processes can share the same read only memory since it will not change.

The system can also avoid copying writable pages right away. By implementing a copy-on-write system, it can have both of the processes share the memory until one of them writes to a page. When on of the processes has to write to a page, the kernel will have to actually create a copy of that page for the new process. This can be implemented by the kernel by turning off the write permission to the pages after a fork. This will cause any memory access to fault, which will signal the kernel to actually now copy the page for the new process.

These optimizations can delay the performance problems of fork when it is called, but it will still end up creating a copy of the page table. Since fork is often used to create a process in which you will call exec, there is actually no point in copying the page table. When exec is called, it wipes the page table and loads in the memory for the new program, so the cost of duplicating the page table from the fork is wasted.

To solve this problem there is a special system call called vfork that creates a new process that shares the memory and page table of the parent process. Since processes are not supposed to share memory, it will pause the parent process until the child thread calls exit or exec. This solves the problem with the unnecessary copying of the parents page table, but is not considered a clean solution since it creates a process that is in a strange state where it temporarily borrows the memory of the parent.


Distributed Systems



Remote Procedure Calls


A remove procedure call allows you to execute a command on a remote machine. This type of call differs from an ordinary function call or system call.

This type of call is a lot slower since it requires communication over a network. The information can only move at the speed of light, so if you are sending the command to a server that is far away there will be a lot of latency.

Having the caller and callee be different machines promotes hard modularity since they do not share address space. For example, if there is a buffer overflow bug that crashes the server it will not crash the client.

One downside to the two ends not sharing memory is that there is no call by reference. All the calls have to be pass by value so it must always send a copy of the data.

The two machines can also be completely different architectures(eg. SPARK vs INTEL). This gives the system more flexibility into what hardware can be used, but it can also cause some compatibility problems with things like endianness.


Marshalling


In order for the remote procedure call to be passed through a network the data must first be serialized. The process of converting the data from an internal format into the serialized form is called marshaling.

The client marshals its data from its internal form into a serialized form, which it then sends over the network to the server. The server will receive the data and unmarshal it into its own internal format. The server sends its response back to the client with the same process.