Computer Science 111 Scribe Notes

Lecture 14: File System Performance

Week 9 Monday, March 5, 2012

By Marc Abuel

Two Ideas for surviving power failures

Idea #1: having a commit record

> do several different writes to file system, but only one counts

> commit record = single low level write that "matters"

> (ex) A. write the blocks from RAM to copy area

file is unchanged, in original location

B. wait for these blocks to hit disk

C. write seperate commit record

> commit record tells us where file resides

file contents changed, location moved

> if power fails during A & B, the file has not been written yet

D. copy from copy area to original area

D1. wait to hit disk

> step not really needed, cos committed to change

> ensures we have two copies of file on disk

file contents are new, location is unchanged

E. clear the commit record

> BEGIN

> pre-commit phase

-during this time, whatever writes done to disk are not permanent, so you can back out with no change

-can run into trouble: run out of memory, I/O errors

> COMMIT or ABORT

-either commit or abort should be atomic

> post-commit (post-abort) phase

-typically done for (long-term) efficiency

-making overall file system layout nice

-hurts short-term efficiency (extra writes slow you down)

-if aborted, counts as a no-op

> END

***counts as a transaction

Idea #2: having a journal

> efficient way for organizing file system in order to make commit records natural and easy to do

> partition file system into 2 areas: cell data, journal

-cells = blocks to be updated

-journal = big long tape that records all changes to cell data

-size: if too small, stall alot; if too big, waste space

-write commit records into journal

-if crash occurs, can look at journal to see which commit records actually happened

> assuming Idea #1 works in order to use Idea #2

+ writes can be contiguous during bursts of activity

+ journal can be on a separate device

+ no need to put cell data on disk at all

> 2 different ways of doing logs

(1) write-ahead logs

-log all writes you plan to do

-COMMIT, write commit record

-install new values into cell data

+ from user's point of view: able to recover the new data after crash

(2) write-behind logs

-log all old values into journal (write down values you are about to erase

-install new values into cell data

-COMMIT

*if you crash in middle of installing new data, can still recover old values into cell data

+ from implementer's point of view: replays journal backwards after crash, so often faster, b/c interesting stuff is at the end

*repairs must be idempotent

============================================================================================================================================================================

A more complicated approach

> let user applications "look inside" pending transactions

+ better performance, better parallelism / interleave, lower latency

- need a way to notify apps that a transaction they peeked inside was aborted (e.g. a symbol)

-> cascading aborts

-> compensating actions (undo your mistake once you figure out an abort has happened)

============================================================================================================================================================================

VIRTUAL MEMORY

*Problem: have unreliable program, if tickled incorrectly, will have bad memory references

e.g. GNU grep 2.10 on 64-bit host

if run on a bare machine (weensyos) => whole system fails

*Solutions:

(1) hire better programmers, write better software (works but its expensive)

(2) runtime checking in compiler generated code (works but runtime cost)

(3) base + bounds register pair

> hardware checks all memory refernces in range, traps otherwise

- must have a fixed size of memory that has to be pre-allocated (can grow RAM, but its awkward)

- RAM limits efficient use of processes

- memory gets fragmented

- apps can change base + bounds registers (solution: priveleged)

- shared memory not practical

(4) multiple base + bound pairs

1 for text (readonly)

1 for data (read-write)

- when you compile + link 'cat' you must tell it where to run (solution: design app so that it does not care where it is run)

> compile app without using absolute address = position independent code (PIC)

> OR dynamically link or patch

> divide virtual address into two pieces: (1) segment # (2) offset within segment

> segments are varying sizes

(5) break up memory into pages, use fixed size pages (e.g. 4KiB)

> solves fragmentation problem: cant grow/shrink segments easily

> virtual address split into 2 parts: (1) page # (2) offset input

> this requires hardware assist

*page tables implemented in hardware:

master tables -> intermediate tables -> actual tables

virtual address split into 3 parts: offset in master table (10), offset in intermediate table (10), offset in actual table (12)

common situation UM for 1 process

*software emulation of 2-level page tables:

// maps virtual address to physical address

size_t pmap(size_t va) {

int offset o = va & (1 << 12) - 1;

int lo = va >> 12 & (1 << 10) - 1 // low order bits

int hi = va >> 22; // high order bits

size_t *l0page = PAGETABLE[hi]; // priveleged register %cr3

if(l0page == FAULT)

return FAULT;

size_t *l1page = l0page[lo]

if(l1page == FAULT)

return FAULT;

return *l1page + offset;

}

> when the hardware faults due to a bad page table entry, traps in kernel

> kernel can:

-terminate process

-send signal to process (SIGSEGV) (i.e. kernel arranges for signal handler)

-arrange for invalid access to become valid; arrange for accessed page to be loaded into RAM

> modify page tables to reflect this

> can return from interrupt (RTI), back to failing instruction

* use big programs even if you dont have enough RAM

> treat RAM as cache for disk

> works if good locality of referencing, thrashing if RA