Virtual Memory and Processes

Uses of Virtual Memory

  1. Allocation and Freeing Memory

    Memory allocation operations need to be fast (e.g malloc/new/etc, free/delete)

    Simplest way: allocate/distribute memory by increasing address

    Pros:

    • Allocation is very fast. Just need to keep a pointer to the start of the next free block and a pointer to the end of memory.

    Cons:

    • Limited number of allocations because there is no way to reuse memory freed by processes.

    This algorithm can be extended to virtual memory to get "atomic" updates to large sections of memory:

    • Allocate pages in sequence.
    • Replace end of memory pointer with a forbidden page (not readable/writable).
    • Load next page before allocating. Loading forbidden page will cause a crash, indicating that the limit has been reached.

    Example of including garbage collection:
    Old arena with objects that may point to one another.
    Copy remaining objects into a new arena, compacting them to the beginning. Pointers may have to be re-set.


  2. Example: Apache

    Apache server has to respond to requests independently when possible. This can be done via fork(). Each process may call malloc.

    
    for(;;) {
    	r = get_a_request();
    	if (fork() == 0) {
    		process(r);
    	}
    }
    	

    fork() will clone the virtual memory (page table+data) each time! This can be very inefficient since requests will typically only touch a portion of the data.

    Solution 1: vfork()

    vfork() behaves like fork() EXCEPT

    1. child takes control and parent is frozen until child finishes
    2. child runs with parent's virtual memory
    As a result, the virtual memory of the parent no longer has to be cloned for each fork.

    Issues with vfork():


    vfork() main feature is performance and is still used.

    Solution 2: changing fork()

    Another solution is to change fork() to maintain its current behavior without the expensive operation of cloning virtual memory.

    This optimization is still costly if requests (our Python scripts) are very small compared to our parent process as the the overhead of copying the page table dominates.

    Solution 3: Integrate Python with Apache

    Since the Python code is now a part of our program, each request becomes a function call within our code. However, this makes the execution sequential breaks our original intention of handling our requests in parallel. To restore paralleism, we can use multithreading.

    We can use pthreads to create a thread for each request. Creating threads have overhead so a faster implementation would be to create a fixed number of threads (e.g. 100 threads). Each thread checks for requests and is blocked if none are found until it checks again.

  3. Fast Message Sending Among Processes

    So far, we know to use pipes to pass data to/from processes. The pipe operation writes the data coming from a process into a buffer, and then it reads the buffer contents into the receiving process. Therefore, each call to pipe involves write making a copy, and read making a copy.

    Alternatively, we can use virtual memory to share data among processes. By adjusting the page tables of the sender's and recipients' virtual memory accordingly, they can have access to the same physical page.

  4. Page Eviction Policies

    Now that we have mechanisms to detect when we need to load a virtual page to physical memory, we need a way to choose what page to evict.

    Examples:

    Simple Example: FCFS/FIFO

    look at trace: 0 1 2 3 0 1 4 0 1 2 3 4

    In the case of 3 physical pages:

    Physical Page\Trace 012301401234
    0 0*003*334*44444
    1 -1*110*00002*22
    2 --2*221*11113*3
    * = page fault
    The result is 9 page faults.

    Let's try again with 4 physical pages:

    Physical Page\Trace 012301401234
    0 0*000004*4443*3
    1 -1*111110*0004*
    2 --2*222221*111
    3 ---3*333332*22
    The result is 10 page faults.

    Therefore, increasing the number of physical pages does NOT always reduce page faults. This is known as Belady's Paradox. This is a contrived case however, and increasing the number of physical pages does typically help reduce the occurrence of page faults.

    Example 2: LRU

    Assuming 3 physical pages

    Physical Page\Trace 012301401234
    0 0*003*334*442*22
    1 -1*110*000003*3
    2 --2*221*111114*
    The result is 10 page faults.

    While LRU is a popular policy, it is not always better than FIFO.

VM Optimizations

  1. Anticipatory Paging vs Demand Paging

    In anticipatory paging, a process loads more than the requested page into memory. The idea is to assume that pages near the requested page will be needed in the near future and preloading them will reduce the number of page faults.

    In demand paging, a page is not read into memory until the program needs it.

    Some calculations:

    N = size of program in pages
    U = # of pages used (U <= N)
    C = cost of reading one page
    F = cost of fault

    total cost latency
    U(C+F) C+F demand paging
    NC NC no demand (load everything)

  2. Dirty page

    We can keep track of what pages have been modified (written) and write to disk only when page is evicted.

    This does not have to be implemented in hardware. We can use an implementation similar to the copy on write used near the start of lecture.

Next topic: Distributed systems...