Lecture 15

By Arjun Patel and Ming Liu

Hardware traps

They are similar to int instructions such as 0x80 except when the kernel returns you want to execute the next instruction, connecting to the kernel. When you execute an instruction and the instruction fails,it should still return to the function where it failed to execute the current instruction. When the instruction fails, this is called a fault.

An example of a fault would be attempting to execute a store instruction (movl %eax, 0x59328-> kernel and fails, returning to the function which was supposed to execute the instruction, creating a fault.

The kernel is responsible for dealing with page faults. Faults are not necessarily errors and instead could be dealing with system wide instruction calls. The kernel is designed to identify when page fault is an actual bug in the program or a signal to execute a system wide instruction call.

A common example of a bug in a program would touching forbidden zone in virtual address space or dereferencing NULL.

An example of a system wide instruction call would be when you have a function that either reads a file or writes to a file.

The kernel keeps track of all processes and determining which processes have access to specific pages.

An example would be one application that attempts to read data from another application to which it lacks permissions, creating an error.

Assume the kernel has a function called swapmap which takes in a process id and virtual address and returns either a disk address or a fail signified by the value of NULL.

        off_t swapmap(pid_t pid, off_t va) 
        {
                returns either NULL or the physical location on disk of the virtual page //pretend there is set of virtual addresses for each process. First it checks if the virtual address
                exists or not, returning NULL if it doesn't. If it does exist, then the virtual address is used as a key value in a map like structure connecting a virtual address to a physical
                address and that value is returned. 
        }
        
        Each process contains its own virtual addresses. The amount of virtual addresses needed for each process varies based on the processes along with the set of virtual addresses. For a 4kb size, we only need the virtual page number and process id to get the disk address. Getting the disk address should be fast, but not extremely fast. It's okay if we use an array of valid regions and walk through it with a Big O(n) because eventually we would get a number.

Pfault is a function invoked by trap telling where to read, write or execute a page.

void pfault(proc, vpn, accesstype)

        off_t da=swapmap(proc,vpn);
        if(da==FAIL) 
                kill(proc,SIGSEGV); //ends process
        else 
        {       
                ppn=removal_policy();
                write page @ ppnv to disk @ snapmap(Procv, vpn);
                readpage@ppn from disk@da;
                update the process' page table so that the virtual page number of the new page reflects the physical page number of the victim page;
                set the virtual address of the victim page to NULL;
        }

}

Choosing A Page

Free memory is wasted memory because it is not being used for anything useful. We can get statistics with a function to tell what memory is being used by the kernel. If 10% of the memory is free, that physical memory is not doing anything. We will look at all pages in physical memory being used by other processes and pick a victim. The victim will give up its physical and virtual address which will be used by a new page. The victim is selected by finding a page that hasn't been used very much.

The best way to choose a victim is by policy. Right now, we only care about mechanism. The first thing we do is copy the victim's data into secondary storage. Then we can copy the data from the page that faulted into the physical memory of the victim while the victim loses the virtual memory address and its physical location. Suppose someone writes to a page. The user might override page anyway in RAM. If we know that the action the user will take will access the whole page, we don't need to read in the whole page from disk. In practice we store bytes and the program overwrites the page, but kernel is kept unaware of this. We want to be able to avoid that.

What we told the hardware is only about the pages in RAM and we put more info in the page table entry, but if we try to use that, it will create a fault. The kernel gets to use the swapmap table and update the page table. Low level read and write system calls write to physical pages. We can then execute the 30 ml instead. We want page faults to be rare. There are two common optimizations to avoid page faults.

2)Another approach is to use a dirty bit to keep track of when the page in RAM is not the page on the disk. If it is not dirty, we should not write to disk.

Page Faults: Replacement Policies

- If we want to read a page into physical memory, and if the physical memory is filled, we must evict a page currently in physical memory to make room for the new page. For this purpose there are some existing replacement policies.

- Replacement policies work by trying to predict what pages currently within physical memory are unlikly to be accessed again within the near future and selecting those particular pages as candidates for eviction

- As such, if an application accesses pages entirely randomly, then no replacement policy is better than any other since the application's page usage cannot be predicted.

- However, applications sometimes exhibit non-random behavior that exhibit spacial and/or temporal locality, and we use the replacement policies to exploit these patterns in order to increase performace.

- In some cases, the hardware may not have/support replacement policies. This can be circumvented by turning off all r, w, and x bits for all the pages and instead store all the information within the kernal. Since the bits are turned off, every page access will fault and the kernal could then take appropriate action.

One Possible Policy: F(irst)I(n)F(irst)O(ut)

- The first page read into physical memory is the first one to be evicted.

- We now run an iteration of FIFO. Assuming physical memory could hold 3 pages and the following VPNs were requested: 0,1,2,3,0,1,4,0,1,2,3,4.

Reference String	start	--0--	--1--	--2--	--3--	--0--	--1--	--4--	--0--	--1--	--2--	--3--	--4--
Physical Page 1	\0	(0)	0	0	(3)	3	3	(4)	4	4	4	4	4
Physical Page 2	\0	-	(1)	1	1	(0)	0	0	0	0	(2)	2	2
Physical Page 3	\0	-	-	(2)	2	2	(1)	1	1	1	1	(3)	3

-(x) indicates a page has been read into memory. FIFO incurs 9 page faults in this instance.

FIFO: Better Hardware

- Typically, there are two main ways to improve performance, via software or via hardware. Now we run FIFO with the same page requests as before, but instead of three physical page slots, we now use four.

Reference String	start	--0--	--1--	--2--	--3--	--0--	--1--	--4--	--0--	--1--	--2--	--3--	--4--
Physical Page 1	\0	(0)	0	0	0	0	0	(4)	4	4	4	(3)	3
Physical Page 2	\0	-	(1)	1	1	1	1	1	(0)	0	0	0	(4)
Physical Page 3	\0	-	-	(2)	2	2	2	2	2	(1)	1	1	1
Physical Page 4	\0	-	-	-	(3)	3	3	3	3	3	(2)	2	2

-More hardware managed to inccur more page faults than before. Typically this is not the case, but for particular inputs this may sometimes happen. This phenomenon is called Belady's anomaly: where better hardware sometimes results in reduced performance.

L(east)R(ecently)U(sed): a different software approach

- Instead of victimizing based on the time the pages were read in, LRU victimizes based on the time that the page was last used.

- We run an iteration with the same assumptions as before and three physical pages.

Reference String	start	--0--	--1--	--2--	--3--	--0--	--1--	--4--	--0--	--1--	--2--	--3--	--4--
Physical Page 1	\0	(0)	0	0	(3)	3	3	(4)	4	4	(2)	2	2
Physical Page 2	\0	-	(1)	1	1	(0)	0	0	0	0	0	(3)	3
Physical Page 3	\0	-	-	(2)	2	2	(1)	1	1	1	1	1	(4)

- Ten page faults this time. So the takeaway here is that effectiveness of a replacement policy is largly correlated with the order of page accesses.

The 'Oracle' Policy

- In some cases, page accesses could be very predictable. Suppose this was such an instance. How many page faults would we incur if we could 'predict' the future page access perfectly and thereby take optimal action at every turn?

Reference String	start	--0--	--1--	--2--	--3--	--0--	--1--	--4--	--0--	--1--	--2--	--3--	--4--
Physical Page 1	\0	(0)	0	0	0	0	0	0	0	0	(2)	2	2
Physical Page 2	\0	-	(1)	1	1	1	1	1	1	1	1	(3)	3
Physical Page 3	\0	-	-	(2)	(3)	3	3	(4)	4	4	4	4	4

- Only seven page faults. Much better than before, but a policy like this requires very predictable inputs, which for most applications, is quite rare.

Distributed Systems & R(emote)P(rocedureal)C(alls)

- A distributed system is a number of independent machines linked via some kind of network

- As for RPC's, an analogy is best for its definition. RPC's to distributed systems is as system calls is to the kernal

- RPC's are generally provided via either an API or an ABT, abstracting away lower level details allowing both ease of use and relative robustness.

- Despite the similarities between sys calls and RPC, there are major differences:

The caller and the calle have different address spaces since they are located on different machines. This has major reprecussions.
Due to the different address spaces, we have obtained free hard modularity since the caller and callee cannot trash each other's stacks.
No calls by reference are possible as a result of address seperation. Of course, an address in one machine would make little sense when passed to another.
The differing machines may have different architectures. So RPC's must generalize for more than one particular architecture.

Solutions for differing architectures

- Consider two machines in a distributed system, one running x86-64 and the other running spark-64. How would we resolve the seemingly simple issue that x86-64 is little endian and spark-64 is big endian? How is the two machine suppose to communicate?

The sender could convert the messages it sends to the receiver's endian.
The sender could send in it's own endian and let the receiver deal with translating the message into something intelligable.
The messages sent after being translated to a common 'language' defined by some standard. This way, all machines need to do is to be able to translate between its own 'language' and the common 'language'.

-So we have the 'translating' down. But how is data actually sent from one machine to another?

-Marshalling: The client takes its message, however complex it may be, and changes it to a format so that it could be sent to the server sequentially(i.e. one piece at a time). The server then reconstructs the original message from the individual pieces that it received.

Stubs

Swap two halves of the word around recursively to convert from little Endian to big Endian. Swap it in log base 2. Intel created the instruction called swap on little endian devices to do this. How do we do it?

Stubs are pieces of code that run in the client gluing the client application to the network. They rename("abc","def") as an example performing conversion. Stubs will put it into a packet and ship it to the server automatically. It is possible to write the stubs by hand, but after 50-100, it gets boring. THey are generated automatically from protcol descriptions. There is a textual description of how to go over the wire and a c code will implement that program. e.g. rpcgen which generates stubs on server to support protocol. It doesn't use the kernel.