Lecture 15: Virtual memory and processes, distributed systems

Transcribed by Benjamin Statz and Junnan Wang


Page Faults

Virtual memory structure

Page faults occur when a process cannot access a page, whether that is due to lack of access privileges or the page simply not being loaded into RAM. Upon a page fault occurring, control is transferred to the kernel. which has 3 options:

  1. Terminate process
  2. Send SIGSEGV
  3. Pull page into memory and retry the instruction (slowest, requires I/O)

What can an app programmer do in case 2? For example, upon a SIGSEGV, Emacs assumes a stack overflow has occurred and discards the stack. This is not a reliable way to detect stack overflow. Emacs uses guard pages which are set as unusable using Nmap, and trigger the SIGSEGV if they are attempted to be accessed - but an array larger than the size of the guard page can cause all indices of the guard page to be skipped over. The solution is simply for Emacs not to create large local arrays.

Page fault mechanics

What is stored in kernel memory?

On a page fault (assuming a valid virtual address):


Virtual Memory Policy

How is the victim page selected? There are several possible page replacement policies:

  1. First in, first out (FIFO), aka victim++
    • Fair policy - no page gets to monopolize memory
    • Problem: evicts commonly used pages just as often as uncommonly used pages
  2. Random replacement
    • victim = random(physsize), to produce a value between 0 and the number of physical pages
    • Same problem: evicts commonly and uncommonly used pages equally often
  3. Oracle policy
    • Look ahead in future program execution and see which pages are needed in the future
    • Victim = page least needed in the near future
    • The past is not considered in this policy
    • Cannot be used by practical operating systems, as they cannot predict future behavior
  4. madvise
    • System call which gives information to the kernel about future memory usage
    • Syntax: int madvise(void *addr, size_t length, int advice)
      • addr: start of virtual address
      • length: number of bytes in memory to consider
      • advice: one of several possible constants which describe the order in which memory will be accessed.
        • Examples include MADV_SEQUENTIAL, MADV_RANDOM, and MADV_NONE; a full list can be found here
    • Has existed for a long time, but many apps don't use it
    • Often doesn't help - following its advice can potentially make program execution worse
    • Program can lie and claim to be more important / more highly used than it actually is, giving its pages unfair preference
  5. Least recently used (LRU)
    • Choose the page that has not been used for the longest time to be the victim
    • Most commonly used policy

Examples:

The same reference string is chosen to test all page replacement policies. For this section, we will assume there are 5 virtual pages (labeled 0-4) and 3 physical pages (labeled A-C), and we will use the following reference string:
0 1 2 3 0 1 4 0 1 2 3 4
In actuality, the reference string is determined by running various programs and keeping a log of pages used. The objective is to minimize overhead by reducing the number of page faults, which are represented by deltas in each table.

FIFO

Cost = 9 page faults

Oracle method

Cost = 7 page faults

Least recently used

Cost = 10 page faults (this decrease in performance is somewhat of an irregularity)

FIFO, with 4 physical pages

Cost = 10 page faults. In this scenario, adding memory actually made performance worse. This is known as Bélády's anomaly.

Implementing LRU policy:

The optimal setup would be to have a dedicated column in the page table to store a timestamp at last access for each page. However, this is not supported by x86, so a separate clock table array must be used. To enable kernel access to this table, set each page in the page table to be inaccessible. When an access attempt is made, a page fault will result. When this happens, set the page to its proper read-write status, so the access can be made, and at the same time set the entry in the clock table.


Virtual Memory Optimizations

Demand paging

Dirty bit (and related concepts)

When a page fault occurs, the old page is written back to disk, and the replacement page is read into memory. The writing step can be redundant if the contents of RAM have not been modified and already match the disk contents. How can this be made more efficient?

  1. Read before writing
    • Before writing to disk, directly compare the contents of the disk and RAM to determine whether the write needs to occur
    • Inefficient - involves even more I/O than before (extra read)
  2. Use a checksum
    • Collisions can occur - modified RAM data may have the same checksum and will not be written to disk, losing data
    • Solution: use a cryptographic checksum
    • Regardless of accuracy, checksums are expensive - checksum table is proportional to size of virtual memory
  3. Dirty bit
    • Most common optimization for this scenario
    • Store an extra bit in each page table entry: 0 if the page has not been modified since it has been read from swap space, 1 if it has been modified
    • Requires hardware cooperation, but it is not a performance problem, since the dirty bit changes very infrequently
    • Can be implemented without a designated dirty bit in the page table, even though it requires kernel access. To do so, use a similar method to the LRU implementation:
      • Mark each page in the page table as read-only
      • When a page access is attempted, a page fault is triggered, allowing the kernel to set the dirty bit in a separate table

fork() optimizations and vfork()

In more primitive implementations of fork(), the entire process is cloned, including the page table and the data - no memory is shared. This is very expensive/inefficient - when a large program implements a smaller one, the majority of the data is discarded when the child calls close() and exec().

The solution is to copy only the page table and have the child's copy point to the same data pages as the parent's. Then, both page tables are set to read-only, which will cause a page fault if any write attempts are made. On a page fault, the page is first set to read-write as it should. Then, a copy of only the modified page is made, and the child page table is updated. This is how modern implementations of fork() function.

vfork is identical to fork, except for 2 differences:

It is no longer POSIX supported, since fork() has become more efficient since it only copies on write, and modern CPUs are fast enough to clone the page table with a negligible effect on performance.


Intro: Distributed Systems

Distributed systems involve running an OS whose components are not all on the same computer. One way of managing this is to use remote procedure calls (RPCs).

Remote procedure calls

RPCs differ from ordinary function calls and system calls in several ways:

  1. No shared memory
    • Only way for the callee to access data is to copy it
    • Example: strcmp(a, b) will not work if one is located on the caller machine and the other on the callee
    • Pros: hard modularity is enforced
    • Cons: difficult to access data
  2. No call by reference
    • For instance, the system call ssize_t read(int fd, void *buf, size_t count) will not work as an RPC since buf is a memory location
  3. Caller and callee can have different CPUs or disagree on instruction sets / data formats (e.g. little vs. big endian, different CPU architecture)

RPCs can be simulated as if they were ordinary function calls using the principle of marshalling, in which complex structures are transmitted bit by bit over the network. In order for the original program to remain unmodified, an additional layer of functionality is inserted between the program and the network, which is responsible for marshalling. For instance, a system call of read() sent from the program would be modified to remove the original reference to a memory address, then converted to a series of bits encoding the call and its arguments. Likewise, an additional layer is needed to decipher the RPC and convert it back to an ordinary system call before it is passed to the server.

Problems with RPCs:

Beginning next lecture: how to fix these problems