Virtual memory efficiency - policy for page replacement
    --> Kernel's responsibility to choose an effective algorithm to choose a victim page when page fault occurs.

Some proposition:
    1. Choose victim at random
     ==> fast to decide but bad performance, prone to hacker who tries to hog the system by controlling page fault
    2. Physical page 0 then 1 then 2....
    3. FIFO
    ==>
    5 virtual pages, 3 physical pages
    reference string:
        0   1   2   3   0   1   4   0   1   2   3   4
    A  ^0   0   0  ^3   3   3  ^4   4   4   4   4   4     
    B      ^1   1   1  ^0   0   0   0   0  ^2   2   2
    C          ^2   2   2  ^1   1   1   1   1  ^3   3
==> 9 page faults

   
    let's increase ram, so we 'll have 4 physical pages

        0   1   2   3   0   1   4   0   1   2   3   4
    A  ^0   0   0   0   0   0  ^4   4   4   4  ^3   3
    B      ^1   1   1   1   1   1  ^0   0   0   0  ^4
    C          ^2   2   2   2   2   2  ^1   1   1   1
    D              ^3   3   3   3   3   3  ^2   2   2
==> 10 page faults

=====> Increasing ram, and result higher page fault!! paradox!

How about the best, optimal solution:
Belady's algorithm: Using the oracle, who know everything, even predicting to the future, to calculate page ordering

        0   1   2   3   0   1   4   0   1   2   3   4
    A  ^0   0   0   0   0   0   0   0   0  ^2  ^3   3
    B      ^1   1   1   1   1   1   1   1   1   1   1
    C          ^2  ^3   3   3  ^4   4   4   4   4   4
==> 7 page faults

However this is impossible to get because there isn't an oracle.

Let's try the Least Recently Used page algorithm: Choose the one that least recently used by the memory

        0   1   2   3   0   1   4   0   1   2   3   4
    A  ^0   0   0  ^3   3   3  ^4   4   4  ^2   2   2
    B      ^1   1   1  ^0   0   0   0   0   0  ^3   3
    C          ^2   2   2  ^1   1   1   1   1   1  ^4
==? 10 page faults ????????

=====> In reality, LRU works better


How to implement LRU:
    Normally kernel does not know what pages were least recently used
    How to fix:
    1. HW support: HW sets a bit in page table each time page is referenced
            Kernel clear bits after page fault

    2. Use clock interrupt to sample pages
        (often good enough)


Reminder:
    OS willing to spend extra cycles to calculate what page to swap out
    because the cost of page fault is large.
    Cache vs RAM, we don't want to spend too much for optimal algorithm.


Some other common optimization:
    1. demand paging:
        Don't wait until we have loaded bunch of pages into ram.
        Instead, load main's page and jump to main
    N = # of pages in program
    U = # of pages actually used
    C = cost to read a page
    F = cost of faulttime

    Assume we have plenty of RAM:
    w/o demand paging:
        cost of VM =NC
        latency  = NC
    with demand paging:
        cost of vm = C + (U-1)*(C + F)
        latency = C




Writing pages to disk:
    victims get written (but only if they've changed!)   
    Solution:
        1. Using hw support: "dirty bit" in page table entry
        2. Keep a copy of the page, so when we need to upoad,
        ==> huge waste of spac
        2.1. Keep a check sum (turns out it's the same as before)
        3. tell HW is read only. Assume HW support for r/w/x/k
        (interrupt per write)



Fast forking vs VM:
Consider Emacs.
It uses lots of page tables, multiple level of page tables.
///** missed a bunch**///

copy-on-write
to parent:
    1. copy on write as well
    2. dally the parent until (a) the child exec or exits or (b) time expires

vfork is like fork
however,
    (1) parent is frozen until chil forks or execs
    (2) parent & child share RAM until child execs
--> if the system support vfork, use it!!





malloc in UNIX:
consider system Unix circa 1977:

----------------------------------------------------------
|  txt area   | data |  bss  (+ heap) |   ........ | stack|
----------------------------------------------------------
<------------------->
  initialized

malloc + VM:
malloc(12)
    it will look through the area in bss and its link to find a memory block that have size 12
    ==> might pagefault if malloc has not been called a long time

we can fix that by change the malloc to store a smaller block of memory.

How about allocating big chunk of memory??

Nowadays, we use a different system call, mmap().
example:
    mmap(fd,  offset  ,   base,   size , option)
                 (in file)  (in RAM)  RAM

we allocate 10 MB, offset 8kB from the beginning of the file.
And some 10 MB block area in RAM will have the same content as the file

and there's also unmap as well. Free also does this...

example:
mmap("/dev/zero",0,0x00003,1024);

This is also a way to create temp file when fork.
Also, similar to instead of swaping from swap space, we read from the file

/*look up more info on mmap */

Final remark:
VM is a way to manage memory efficiently, not a way to run big apps in small ram machine.