CS 111 Lecture 14 Scribe Notes: Virtual Memory

by Alan Covarrubias

Table of Contents

Atomic Writes to Disk

Virtual Memory

Atomic Writes to Disk

GZIP Bug

Before Bug Fix

fd = open(“foo.gz”, O_CREAT);
write();
close(fd);
unlink(“foo”);

After Bug Fix

dfd = open(“.”, O_SEARCH);
openat(dfd, “foo.gz”, O_CREAT|…);
write();
fsync(fd); // saves all the information associated with this file to disk including data and metadata
fdatasync(dfd); // saves only data to disk, not metadata
close(fd);
unlinkat(dfd, “foo”);

Bug Explanation

There is a bug in the program GZIP that may cause files to be lost due to the write function writing to cache and not disk. Since the cache memory is volatile, if the system were to crash after it unlinks "foo" but before the write is written to disk, all data could be lost! This bug can occur in any program such as mv that moves a file from one place to another. The bug fix using fsync and fdatasync causes a very large decrease in the speed of the GZIP program, but it increases the reliability of the program by forcing the writes to be written to disk and not cache. This creates a predicament where we want speed and reliability but we cannot have both! If you want reliability, there are two main approaches one can take to assure this.

  1. Commit records
  2. Journaling

Commit Records

Assumptions for Writing Individual Sectors

  1. Individual sectors can be written atomically.
  2. If the power goes out before writing to disk, it is as if the write never happened.
  3. If the power goes out during a write to disk, you have enough power to finish.
Atomic Writes
W1 W2 ... WN Commit Record

With a commit record, we can bundle several atomic actions into one large atomic action.

The first n writes are tentative until the last commit record is written.

If the last commit record was not written, then all the previous N records are ignored.

We have built a big atomic action out of small ones using journaling, which is similar to spin locks and mutexes.

Key idea is that we take multiple small atomic writes and build them into a large atomic write using the commit record.

Journaling

We divide our file system into two pieces cell memory and a log.


Log
BEG a:A b:B COMMIT END
Cell Memory
a
b

We perform a write by first storing a commit record in the log before writing to cell memory.

Won’t we still have a problem? What if the system crashes before the commit is written?

After a crash, the CPU looks at the logs and checks for inconsistencies with the cell memory. If the log is inconsistent with the cell memory, then the write to cell memory repeats until the END is written. The END write to log signifies consistency between the cell memory and the log.

What happens when we run out of log?

In practice, when you build a file system, it’ll ask how big the log you want to be. When the log is filled, we write from the start recursively.

Suppose you crash in the middle of recovery

Recovery should be robust in the presence of crashing.
Must make recovery idempotent. Ex. recover(recover(x)) = recover(x)

Step by Step Atomic Implementation

Logging Protocols

There are two main types of logging protocols that support atomic writes to disk.

Write-Ahead Protocol

With write-ahead, the CPU first stores the values it plans to write to memory in the log. Once finished, it stores a commit in the log and begins writing data to cell memory. If the system crashes before the commit, then the write is lost and the cell data stays consistent with the old data. If the system crashes after the commit, then on recovery the CPU searches forward looking for commits that have not yet been installed to cell memory.

Write-Behind Protocol

With write-behind, the CPU stores the old values in a log before writing data to cell memory. The CPU only writes a commit to the log when the write to cell memory is complete. If the system crashes before writing any new data to cell memory, recovery is simple since only changes were made to the log. This leaves leaves the cell memory in the old state before the start of the write. If a crash occurs while writing some new data to disk, the CPU must use the logged old data to return the cell memory back to the old state.

Main Memory Database

A rare type of database that stores all its data in RAM instead of writing it to cell memory. Since RAM is volatile, a crash will lose all its data. Any changes to the virtual memory database is logged on disk. On reboot, all memory must be rewritten back to RAM by using the log writes written on disk.

Pros/Cons:

Higher Order Operations

When a high level operation calls a lower level operations. We have two options that we can take here on how these processes interact.
Cascading aborts are more automatable, but compensating actions are more flexible.

Cascading Aborts

Compensating Actions

Mount Tables

Can we partition our data into 10% “must be persistent” vs 90% “okay if we lose it”?

Suppose you have an application that needs to be reliable and another one that doesn’t need to be.
Or suppose you have a mixture of data, some that require reliability and some that don’t.

Linux allows this by introducing a data structure called a mount table.

This maps a file system reference and an inode to the file system that contains the data.

Mount Table
(reference, inode#) => file_sytem
(dev_t, inode#) => dev_t

Virtual Memory

We've examined how to protect data from power outages using methods that allow atomic writes to memory. Now we'll examine a different form of unreliability that can cause problems: unreliable programs that make bad memory references.

Possible Solutions:

A Naive Approach

Memory Layout
Kernel Code Process 1 Process 3 Process 2 Kernel Code

When any process runs, two registers will store the base and bounds of the memory allocated for that process.
Any memory reference by that process will check if it lies in the bounds.
Processbase <= Memory Address < Processbound

It works well in batch environments but it does not work in dynamic environments.
Although you can copy data around around to free up space when growing or shrinking memory, copying is expensive.
Does not allow a simple method of sharing data between processes.
It would be useful if segments did not have to be contiguous in physical memory.

Solution:

Instead of having a base and bounds pair, we could also maintain multiple register pairs for each process, with each pair marking the boundary for some area in the memory such as text, data and process's stack. Additionally, we could dedicate one pair to a shared memory region, which solves the problem of sharing memory in the naive approach.

Paging

Page Table Picture

A more efficient way of isolating memory per process is by giving each process its own page map. A page map is an array of page map entries where each entry translates a small fixed-size range of contiguous bytes of virtual addresses, called a page, to a range of physical addresses called a block. Each entry in the page map maps a page number to a block number and uses the byte offset of the address to return the exact physical address of the memory the program would like to dereference. This is implemented by a the Virtual Memory Manager that was imbedded into the hardware. If a program's page table does not contain a physical address, that process cannot reference that piece of memory.

Advantages of Paging

  1. You create a page fault if the memory isn’t in RAM which uploads it from the disc. Only works efficiently if you assume the program doesn’t access memory randomly.
  2. Programs can share memory “safely”. You can implement interprocess communication efficiently and safely if different page table entries map to the same physical block.
  3. We don’t have to worry about fragmentation. The indirection allows contiguous virtual addresses to be discontiguous in physical memory. This allows the dynamic growing and shrinking of virtual address spaces to be efficient.