CS111 Lecture 15 Notes
By Yichen Pan & Haonan Zhou
Think of it as a malloc arena, where some entries contain useful data and others contain garbage
Problem
How to work out malloc so that it is FAST? For example: malloc(537)
Approach1
We want to implement this with as few instructions as possible, say 10 instructions on average. Therefore one obviously solution is looking for free space of 537 bytes long in memory. However, this process is slow and we cannot make this less than 10 instructions.
Approach2
Using page table to get atomic updates to full pages (Copying Garbage Collector)
- We can map the pointers into a new "arena"
- Use 'mmap' system call: it establishes relationship between memory and virtual memory (lets you create a "window" into a file)
- malloc can call 'mmap'
Problem
Apache needs to do the following:
- It calls malloc() for a lot of memory
- It responds to requests independently when possible
A first attempt
We can call fork() for each request:
for(;;)
{
r= get_a_request();
fork(); // clones virtual memory
process(r); //exelp("python"), probably won't use all the virtual memory we copied
}
The problem with this approach is that each call of fork() makes a copy all the virtual memory. However, imagine that process() executes a simple program (e.g. exelp("python") ). In this case, we copied a lot of unnecessary memory.
Another approach
Using a controversial system call:
vfork(). vfork() acts like fork(),
EXECPT:
- Parent is frozen until child does exec or exit
- Child runs with parent's virtual memory, not a COPY of virtual memory
- The downside of this approach is also obvious: the child can make any change to the parent's virtual memory. And parent is not aware of this because it is frozen while child process runs.
A cleaner approach
Making fork() more efficient (copy-on-write, or COW)
- Instead of cloning all the virtual memory, we just clone the page table, and we make copy only when either parent or child writes.
- Even if the page is actually writable, we still make a page table entry read-only so that we will get a page fault when trying to write to that page. The page fault gives control back to the kernel. Then, if the kernel decides that the page can be written to, it makes a copy of that page and make both entries writable for parent and child.
- Advantage of this approach: We will end up copying some pages, but not all
- Downside: If a large program calls a small program ( e.g, Apache calls 'echo' ), then vfork() has a better performance.
Solution
Incorporate python into Apache: No fork() needed!
- Problem: performance issue: code is sequential
- We can create a thread for each request. However, creating thread will have overhead, and every request will have it.
- How to avoid?
- We can create threads at beginning, say, 100 threads (because we have about 100 cores) For each thread we block until the request is read to process.
One approach
Using a pipe
Problem: 'write' makes a copy, 'read' also makes a copy. If the message is big (several MBs), the procedure slows down significantly.
A better way
Using virtual memory
- Instead of copying memory, simply adjust page tables of sender's virtual memory
- Generally, it is 1000 times faster than copying data
- Aside: A kernel which uses this approach to send messages: Mach -> developed in CMU and later became basis of kernels in Mac OS X
Let's assume the following:
- There are 5 virtual pages: 0, 1, 2, 3 and 4
- There are 3 physical pages: A, B and C
- The sequence of page request is: 012301401234
Always choose the first physical page
- Unless page requests are completely random, this is a very bad policy: only one physical page is used
Choose each page in turn
- Suppose we have p physical pages, then we choose 0, 1, …., p, 0, 1, …
- 9 page faults
- Not very good unless you know the program only access virtual memory sequentially
Choose a page that is least recently used (LRU)
First Come First Serve (FCFS) or First In First Out (FIFO)
- First assume 3 physical pages: ABC
- 9 page faults with 3 physical pages
- What if we have 4 physical pages?
- 11 page faults with 4 physical pages
- We have more page faults when we have more physical pages?
- This is called Belody's anomaly -> rare in practice
(optimal) Look for a page that will not be used in the furthest future
• 7 page faults
• Aside: how to know when a page will be used in the future?
- We can look at history on similar runs (e.g. when booting a linux kernel)
- We can also get help from compiler
More optimization
Demand page
Don’t read page from swap space until program needs it
- Let's compare demand paging vs. non-demand paging
- N = number of pages in program
- U = number of pages used in the current run ( 0 < U < N )
- C = cost of reading a page
- F = cost of a page fault