Lecture 15: VM and processes; distributed systems

Scribes: Jonathan Lee and Raymond Moy

Take a 32-bit address space and split it up.

From virtual memory to physical memory

There is usually more physical memory than virtual address space. Example: 8 GB of RAM vs. 4 GB of address space With a 64-bit machine it's possible in theory to have to have 52-bits of page # with the same 12 bit offset but no one has that much RAM right now. The main benefit of virtual memory is that consecutive pages in virtual memory do not need to be consecutive in physical memory. This has led developers to assume they have a whole big chunk of contiguous memory and therefore the assumption that RAM is truly randomly accessible. This is only possible with Uniform Memory Access (UMA). UMA means one doesn't need to care which memory location you're reading from in physical memory. They'll all be accessible virtually. If this assumption was false, virtual memory couldn't work. This leads to the idea that perhaps software developers rely too much on the hardware.

When will we have computers that have 64-bit RAM?

Time vs. Bits graph

64-bit RAM will be the most the world will ever need (notice how the graph levels off). 128-bit is unnecessary (most likely).

Virtual Memory Implementation

Best way to implement the mapping from virtual address to physical address is with a linear page table. To do this, part of the memory must be reserved for the mapping from virtual to physical.

Linear Map Table
int pmap (int vpn) {
	Return PAGETABLE[vpn];
}

On x86 processers there is a hardware register %cr3 devoted to handling the PAGETABLE pointer. Applications should not mess with cr3 because it would allow access to memory they should not be touching. Cr3 is a privileged register, and each process has its own cr3. Although there is only 1 physical cr3, you keep track of each process' own cr3.

Does each thread need its own cr3? Threads all share the same RAM, same privileges, and the same memory processes can access, and thus each use the same cr3. Switching threads does not change the cr3.

If there was an x86 process that takes up the entire 4GB of RAM, how big is the page table?

It would be a waste of space if the 4GB of space and its 4MB page table were not used completely. Furthermore, since the page table is stored in RAM, there can be bootstrapping problems. These were fixed through special kernel-only instructions that allow direct access to physical memory along with the use of 2 level page tables.

2 Level Page Tables

2 Level Page Table
Virtual Address Layout
int pmap2 (int vpn) {
	int hi = vpn >> (10+12);
	int lo = (vpn >> 12) & ((1 << 10) - 1);
	int off = vpn & ((1 << 12) - 1);
	return PAGETABLE;
}

Previous code was busted. Originally was meant to go from a virtual page number to a physical page number, but instead goes from a virtual address to a physical address.

int pmap2 (int vpn) {
	int hi = vpn >> (10 + 12);
	int lo = vpn (1 << 10) - 1;
	return PAGETABLE;
}
Virtual Address Layout

Bourne Shell Improvement

Written by Steve Bourne in Bell Labs. He had too much time on his hands so he decided to make the shell run faster. What was bugging him was that when he had to allocate stuff on the heap, he had to look at the current memory boundary to see where the heap was. If his object was getting too close to the boundary, he'd have to call the operating system and get more memory.

void *malloc (size_t n) {
	char *p = np;
	np += n;
	if (hp > lim) {
		sbreak(1000);
		lim += 1000;
	}
}

Steve Bourne thought this code was slow. Malloc gets called a lot in the shell. He thought why not just make malloc inline. This way, it was reduced to 2 instructions and made the shell run a lot faster. However, the drawback was that when anyone was almost out of memory it would throw a segmentation violation. He fixed this by checking what instructions were doing nearby and whether they were calling malloc. He was essentially moving runtime checking to the hardware. This drove people nuts because when you traced the shell it would have lots of instructions, and it would look like it had a bad pointer but it would keep running. This made it a real pain to debug. Would something like this work on a modern day language like Java or Javascript? It might depending on the application.

Page faulting

If we try to access somewhere in the page table that does not map to a physical address in RAM, the kernel should trap, and the following code should be executed:

int swapmap(proc, va){
	return disk address or FAIL;
} 

void pfault(int vpn){
	if(swapmap(currentProcess, vpn) == FAIL)
		kill(currentProcess, SIGSEGV);
	else{ // Find a page to swap out
		(p, vpn') = removal_policy();
		pa = p-> pmap(vpn');
		write pa to disk at swapmap(p, vpn');
		p->pmap(vpn') = FAULT;
		read pa from disk at swapmap(currentProcess, vpn);
		current->pmap(vpn) = pa; 
	}
} 

Thrashing

Too many large processes chase too little RAM, killing performance

How do we determine which pages to page out?

A few considerations:

Now say we need to access pages 0 though 4 in the order: 0 1 2 3 0 1 4 0 1 2 3 4. However, we only have three physical pages: A, B and C. What removal policy should we use?

Common optimizations

How do you do fork()?

Since cloning the whole page table will make fork() slow, we instead clone the page table lazily, meaning that we only clone a page when something on that page is changed