Lecture 14: File System Robustness, Virtual Memory

by Ryan Rosario and Chenyang Xia


File System Robustness

Example Problem: Atomic Write

While a robust file system is not weighted as heavily as processor speed or amount of memory when a customer buys a computer, it is very important and a significant amount of effort is spent on it during development. One feature of a robust file system, for example, is atomic write. It guarantees that once a write operation is finished, either all specified data is written, or all data will remain the same as before the write operation


An assumption of the underlying hardware is that write operation will temporarily scramble data, making the data being written to garbage in the mean time. For example, consider overwriting "A" with "B" in the following array. This is denoted with write("B", 1).
 
Before Write
1       2      3       4       5
A     B C D E
 
During Write
     B C D E
 
After Write
    B     C D E

 


If the data in block 1 is read before the write completes (for example, reboot after power failure in the middle of the write operation), garbage value will be obtained, and atomic write will be broken.
  
One approach to resolve this problem is to keep multiple copies of the data. This gives the system a way of knowing whether the write has completed and how to recover from failed write operations by comparing multiple copies of data. Both approaches below assumes that all copies initially start with the same data.
 
Approach 1: Two Copies of Data 
 
Copy1    Copy2
A          A
? A
B A
B ?
B B

Note that this approach is not sufficient to recover from write failures. Just by comparing 2 copies of data, it is possible to detect incomplete write operations, but there is no way of knowing whether the data in Copy1 or Copy2 is garbage or not.
 
Approach 2: Three Copies of Data

Copy1    Copy2    Copy3
A           A A
? A A
B A A
B ? A
B B A
B B ?
B B B
 
With 3 copies of data, we can always get either the original data or the data to be written, with the following rule:

    if(Copy1 = Copy2)
        use Copy1
    else
        use Copy3    
 
With this approach, the atomicity comes at the cost of space. An important rule regardless of approach used is never overwrite the only copy, making recovery possible.
 
Modern file system developers often follow a set of assumptions about fault tolerance of disk sectors during design. A commonly used model is specified by Butler Lampson and Howard Sturgis.

Lampson-Sturgis Assumptions:
Writes can fail.
Writes may corrupt storage.
Bad sectors can be detected by reads later.

Blocks can spontaneously decay.

Failure rates are low.

Building Atomic Operations with Commits

Example: Writing Data to Disk, with Duplicate Storage

Instead of directly copying data from source to destination, we can break it into steps:
                1. Write new value into "copy area" on duplicate disk
                2. Wait for block to hit disk
                3. Write a commit record saying "DONE"
                ------------------------------------------------
                4. Write new value into "destination area" on disk
                5. Wait for block to hit disk
                6. Clear the commit record for data block written

Although this approach doubles the number of writes, the performance loss will not be too big when most writes are contiguous. More importantly, it enables atomicity by checking commit record entries on reboot, allowing the write operation to be completed if commit record entry is "DONE" and recover otherwise.

If the system crashes during step 3 or before, it is as if no write operation ever took place, since only data in the copy area is modified. The destination area's data remains the same.
If the system crashes during step 4 or after, the data in destination area can be considered garbage, but a copy of the data desired is still available in the copy area. In this case, it is possible to complete the write operation by performing steps 4 to 6 again. If successful, the commit record would be cleared.

It is okay to repeat steps from 4 to 6 multiple times, because they are idempotent, meaning that the outcome will be the same no matter how many times they are performed.

Copying data to disk with commits follows the general pattern of atomic operations, which consist of pre-commit phase, when it is possible to roll-back and recover to the original state, and a post-commit phare when no matter what happens, the operation will eventually finish.

Pattern for Building Atomic Operations
Begin---------------------------------------------------------
                Pre-Commit Phase (steps 1 - 3 in example above)
COMMIT
                Post-Commit Phase (steps 4 - 6 in example above)
End-----------------------------------------------------------


Recovery with Journaling

Journals are append-only logs that record values of variables throughout changes, and nothing is ever erased from them. They can be used to provide atomicity with commits much like the previous example, but it only requires enough storage for journal to hold data copied. In this approach, the normal storage (as the destination area in the previous example) is called cell storage. Journals split the file system into two parts: the journal, and the actual data.

When using journals, the order of logging and installing data along with a recovery procedure are required. Two protocols generally used are "write ahead log" and "write behind log".

Write Ahead Log
Write ahead logs use the same idea as the previous example, when a duplicate of storage is used. During recovery, it tries to complete the write operation with a copy of data desired.
            Order of Logging and Installing:
                1. Log new values
                2. Commit
                3. Install new values to cell storage

            Recovery Procedure:
                After crash, copy data from journal storage to cell storage to maintain consistency. This is the same as the previous example with duplicate storage.

Write head logs take a performance hit that can be fixed by storing the journal on a separate disk. For best journal performance, one should use SSD.
An alternate approach is to use the journal to store real data and the cell data is simply a cache (perhaps entirely in RAM).

Write Behind Log
Write behind logs uses a different ordering of logging and installing, and tries to roll back instead of continue with the write operation.
            Order of Logging and Installing:
                1. Log old values
                2. Install new values
                3. Commit

            Recovery Procedure:
                After crash, look through the journal for installed but uncommitted changes. Undo there changes by installing old values currently stored in journal.


So far, higher level has been "blissfully atomic". We can get better performance by allowing higher level applications look inside atomic operations, and:
                1. Have higher level write operation fail (trash blocks, etc)
                2. Let application specify compensation action after failure (ask for money back, etc)

Journaling can pose a security risk that applications such as shred cannot overcome. Shred does not work well with a write-behind log because a love letter will
be journaled and even when the file is deleted, the love letter may still be in the journal!


Virtual Memory

Two Program's View of Memory

Low Address                                                                                                                                            High Address
OS
text
data
block system storage
heap
forbidden
stack
OS
text
data
block system storage
heap
forbidden
stack

Problem: When multiple programs share the same memory address space, bugs in one can affect others as well with bad memory access.

Potential Solutions:
0. Hire better programmers. -> but even great programmers can make mistakes
1. Use programming languages with run-time memory checking, such as Java
2. Base and bounds registers, assign region of memory that can be used, and system traps if accessing memory outside of that region.
                Problematic solution, because:
                        1. Programs can change base and bounds registers if non-privileged, but performance suffers greatly with privileged instructions for every memory reference
                        2. How much to allocate to a program is difficult to predict ahead of time.
                        3. Only relative address can be used, since program has no knowledge of where its memory will be beforehand
3. Use paged virtual memory, give each process an illusion that they have the memory to itself
                Avoids many problems of previous solutions, and program sees contiguous memory with virtual memory
                Virtual memory have contiguous addresses, with each page mapped to some physical address. The physical memory mapped to are not necessarily contiguous.
                Generally 4 KB pages are used as page size (see below)

Typical Size of Virtual Page and Physical Page

51 bit Virtual Page Number              
13 bit Offset (same as physical memory offset)

35 bit Physical Page Number
13 bit Offset (same as virtual memory offset)   

The offsets are copied directly during mapping.