Computer Science 111 Scribe Notes

Lecture 14: File System Performance

 

Week 9 Monday, March 5, 2012

By Marc Abuel

Two Ideas for surviving power failures

 

Idea #1: having a commit record

 > do several different writes to file system, but only one counts

            > commit record = single low level write that "matters"

            > (ex)    A. write the blocks from RAM to copy area

file is unchanged, in original location

 
                                    B. wait for these blocks to hit disk

                                    C. write seperate commit record

                                                > commit record tells us where file resides

file contents changed, location moved

 
                                                > if power fails during A & B, the file has not been written yet

                                    D. copy from copy area to original area

                                                D1. wait to hit disk

                                                > step not really needed, cos committed to change

                                                > ensures we have two copies of file on disk

file contents are new, location is unchanged

 
                                    E. clear the commit record

                       

 

> BEGIN

> pre-commit phase

                        -during this time, whatever writes done to disk are not permanent, so you can back out with no change

                        -can run into trouble: run out of memory, I/O errors

> COMMIT or ABORT

            -either commit or abort should be atomic

> post-commit (post-abort) phase

            -typically done for (long-term) efficiency

            -making overall file system layout nice

            -hurts short-term efficiency (extra writes slow you down)

            -if aborted, counts as a no-op

> END

***counts as a transaction

 

Idea #2: having a journal

            > efficient way for organizing file system in order to make commit records natural and easy to do

            > partition file system into 2 areas: cell data, journal

            -cells = blocks to be updated

                        -journal = big long tape that records all changes to cell data

                                    -size: if too small, stall alot; if too big, waste space

                                    -write commit records into journal

                                    -if crash occurs, can look at journal to see which commit records actually happened

 > assuming Idea #1 works in order to use Idea #2

            + writes can be contiguous during bursts of activity

 + journal can be on a separate device

 + no need to put cell data on disk at all

 

            > 2 different ways of doing logs

                        (1) write-ahead logs

                                    -log all writes you plan to do

                                    -COMMIT, write commit record

                                    -install new values into cell data

                                    + from user's point of view: able to recover the new data after crash

                        (2) write-behind logs

                                    -log all old values into journal (write down values you are about to erase

                                    -install new values into cell data

                                    -COMMIT

                                    *if you crash in middle of installing new data, can still recover old values into cell data

                                    + from implementer's point of view: replays journal backwards after crash, so often faster, b/c interesting stuff is at the end

             *repairs must be idempotent

 

============================================================================================================================================================================

A more complicated approach

 > let user applications "look inside" pending transactions

            + better performance, better parallelism / interleave, lower latency

            - need a way to notify apps that a transaction they peeked inside was aborted (e.g. a symbol)

                        -> cascading aborts

                        -> compensating actions (undo your mistake once you figure out an abort has happened)

 

============================================================================================================================================================================

VIRTUAL MEMORY

 

*Problem: have unreliable program, if tickled incorrectly, will have bad memory references

            e.g. GNU grep 2.10 on 64-bit host

                        if run on a bare machine (weensyos) => whole system fails

*Solutions:

  (1) hire better programmers, write better software (works but its expensive)

  (2) runtime checking in compiler generated code (works but runtime cost)

  (3) base + bounds register pair

            > hardware checks all memory refernces in range, traps otherwise

            - must have a fixed size of memory that has to be pre-allocated (can grow RAM, but its awkward)

            - RAM limits efficient use of processes

            - memory gets fragmented

            - apps can change base + bounds registers (solution: priveleged)

            - shared memory not practical

  (4) multiple base + bound pairs

            1 for text (readonly)

            1 for data (read-write)

            - when you compile + link 'cat' you must tell it where to run (solution: design app so that it does not care where it is run)

                        > compile app without using absolute address = position independent code (PIC)

                        > OR dynamically link or patch

            > divide virtual address into two pieces: (1) segment # (2) offset within segment

            > segments are varying sizes

  (5) break up memory into pages, use fixed size pages (e.g. 4KiB)

            > solves fragmentation problem: cant grow/shrink segments easily

            > virtual address split into 2 parts: (1) page # (2) offset input

                        > this requires hardware assist

           

            *page tables implemented in hardware:

            master tables -> intermediate tables -> actual tables

            virtual address split into 3 parts: offset in master table (10), offset in intermediate table (10), offset in actual table (12)                      

 

            common situation UM for 1 process

 

            *software emulation of 2-level page tables:

            // maps virtual address to physical address

 

      size_t pmap(size_t va) {

            int offset o = va & (1 << 12) - 1;

            int lo = va >> 12 & (1 << 10) - 1 // low order bits

            int hi = va >> 22; // high order bits

            size_t *l0page = PAGETABLE[hi];           // priveleged register %cr3

            if(l0page == FAULT)

                  return FAULT;

            size_t *l1page = l0page[lo]

            if(l1page == FAULT)

                  return FAULT;

            return *l1page + offset;

      }

 

            > when the hardware faults due to a bad page table entry, traps in kernel

            > kernel can:

                        -terminate process

                        -send signal to process (SIGSEGV) (i.e. kernel arranges for signal handler)

                        -arrange for invalid access to become valid; arrange for accessed page to be loaded into RAM

                                    > modify page tables to reflect this

                                    > can return from interrupt (RTI), back to failing instruction

            * use big programs even if you dont have enough RAM

                        > treat RAM as cache for disk

            > works if good locality of referencing, thrashing if RA