By Baixiao Huang and Jiayi Lu
How do we allocate and free memory?
We can call:
Mallac /new/etc (give you a pointer to its Arana Location)
Free()
We can allocate space like this:
Arena:
XXXX|XXXXXX|XXXXX|-------------------free space-------------------
^ptr1 ^ptr2
X: allocated space
ptr1: heap pointer
ptr2: end pointer
This is very fast when allocating new object but … it has a problem!
Problem: we may unable to recover free space when allocated object get free!
Arena:
XXXX|---------------|XXXXX|-------------------free space-------------------
^ptr1 ^ptr2
Free space cannot be reallocated for new object!
So how to deal with this?
Copying Garbage Collector (also used in Java):
01 | 02 | 03 | 04 |
Rearrange the allocated memory to:
01 | 02 | 03 | 04 |
^ | ||||
Heap Pointer |
So that we won’t have fragmentation.
Mmap (system call) -- let you create a window into a file. Create a mapping of virtual address space to memory.
Munmap -- unmap a region from memory. So unmapped area will become prohibited zone.
You can use it to access the mapping of virtual memory address to physical memory address. You can also change the mapping. These functions are executed by kernel, so you cannot mess up with kernel by calling this function.
Apache calls malloc
We want Apache to:
-- responds to requests independently when possible
-- don't want to wait until the first request to finish(request get processed in parallel)
Suggestion:
we can do this via fork() for each request
//pseudo c code
for (;;){ r=get_a_request(); if (fork==0){ precess(r); //process the request } //else wait for new request }
Problem with this method:
Every time parent forks, child clones all the virtual memory of parent and only do a small request. So we need to optimize cloning virtual memory.
Suggestion:
use vfork()
--not only need to know process also need to know virtual memory, not orthogonal
--acts like fork()
EXCEPT
1. parent is frozen until the child dies
a. exec
b. exit
2. child runs with parent’s virtual memory
(If child modify something, parent will see, but parent is asleep and cannot do anything!!!)
Suggestion(a cleaner approach)
Copy just the page tabe
-clone just the page table (not data) (1000 times faster)
-make copy only when parent or child writes (Copy On Write)
Parent page table entry----->shared page
Child page table entry------->shared page
But when child writes:
Parent page table entry----->original page
Child page table entry------->modified page
if (execlp “echo”) just small program
cope on write is more overhead, vfork() will win.
No fork +exec !- sequential code
We want apache and python run together but we don't want fork+exec!
So use:
pthread =create a thread for each request
(some overhead here because everytime create a thread, kernal have to look for empty space)
Solution: Create 100 threads at the beginning of the program, And let each thread to handle one request.
each thread:
for (;;){ block waiting for req(r) process(r) }
Stories for Virtual Memory
1. mallocs
2. apache vfork fork thread
3. fast message sending to another process
a. use a pipe
but “write” and “read ” have to make copy
b. use virtual memory
process 1 send message to process 2 by writing to page. (Here we yse virtual memory for speed, which is a very important reason why we need virtual memory nowadays.)
0: Always choose first page
--it wastes RAM space, because it always has page fault and swaps the first page (this approach doesn't work)
1: n++%p
--if you know program always access to virtual pages
2: let’s assume random access to virtual pages
--nearly any policy will do
3: LRU (least recently used)
--throw away clothes you don't wear for so long
4: ask what pages will be used in the future
--assume on Oracle (look in the future not the past)
--pages that currently in the RAM
--Furthest future access
How to build on approximation to an oracle
--look at history of similar runs. (example: when booting a linux kernel, we boot what we have last time)
--get help from complier
Simple FCFS/FIFO (Here assume we have 5 virtual pages)(look at trace)
0 1 2 3 0 1 4 0 1 2 3
A .0 0 0.3 3 3.4 4 4 4 4
B . . 1 1 0 0 0 0 0 .2 2 2
C . . . 2 2 2 .1 1 1 1 1.3
Total cost with this method is 9 page faults
If we take hardware approach and try with 4 virtual pages instead of 3
0 1 2 3 0 1 4 0 1 2 3 4
A .0 0 0 0 0 0.4 4 4 4.3 3
B . .1 1 1 1 1 4.0 0 0 0.4
C . . .2 2 2 2 2 2.1 1 1 1
D . . . .3 3 3 3 3.3 2 2 2
The overall cost is 10 Page Faults! More than with 3 virtual pages.
Exploring LRU
0 1 2 3 0 1 4 0 1 2 3 4
A .0 0 0.3 3 3.4 4 4.2 2 2
B . .1 1 1.0 0 0 0 0 0.3 3
C . . .2 2 2.1 1 1 1 1 1.4
With the above example 10 page faults is the optimal result
#1:Demand paging
--bring the page into RAM only when program needs it
Advantage: Decrease in wait time
Disadvantage: More page faults during program execution
U: #of pages used
C= cost of reading an page
F= cost of faulty (trap into the kernel, kernel have to track)
(on disk is large, RTI)
Total cost | Latency | |
Demand page | U (C+F) | C+F |
No demand page | NC | NC |
#2:Dirty Bit
--keep track of which page on RAM is not same as Disk(cached copy not equals disk copy)
--when dealing with victim page, if the page is dirty, remove it directly without saving it to disk
Advantage:
It doesn’t need time to store data to disk when page is dirty. Increase performance.