Notes 3/5/12
Surviving power failures:
Idea 1: Commit record – have an atomic low level write that commits the new write.
write blocks to copy area
wait for blocks to hit disk
write to separate commit record
copy from copy area to original area
wait for data to hit disk
clear commit record
This process can be described as completing a transaction:
Precommit phase: (can still back out at this point w/ no change)
COMMIT / ABORT
Postcommit phase: for fs to clean up after write
Idea 2: Journaling
|
|
|
|
|
|
|
|
|
|
Journal (ideally infinitely long, practically ~2x size main memory)
|
|
|
|
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Cells
The file system uses two separate areas – cells and the journal.
For a write in cell 1 from A->B, first enter this change into the journal. Once the write has actually been committed to the cells, mark this completion in the journal.
Advantages:
Writes can be contiguous during burst activity (just write all changes to the journal, commit later)
The journal can be on a different device than the data (eg a SSD)
Ideally, no need for cell data – just reconstruct disk state from journal
Journal Logging:
Write ahead logs
log all writes you plan to do
COMMIT
install new values to cell data
Write behind logs
log old values in journal
install new values to cell
COMMIT
Which is faster?
Write behind logging – with write behind logging, to reconstruct the drive state, rewind from end of journal to the last commit. With write ahead logs, you need to start at the very beginning of the journal and track until the last commit.
A more complicated approach with journaling for better performance:
Let applications 'look inside' pending operations from previous applications to a particular cell. This results in better parallelism/performance/latency
Complication with this approach: if a pending transaction is aborted, a application needs a way to tell other applications ahead of itself that the transaction record is wrong, and to roll back their own transactions.
Virtual Memory:
Problem: Unreliable programs have bad memory references and can potentially crash other applications (or the kernel) if they access an incorrect address.
Solutions:
Get better programmers (expensive, hard)
Use software checks, eg Java (slow)
Base/bounds registers: applications can only use memory starting at a base address extending for a particular bound. A trap is triggered on an out of bound access
Problems with this approach:
Have to preallocate memory for each process on startup
Size of physical RAM limits total memory applications can use
Memory fragmentation
No way to share memory
Multiple base/bound pairs
1 for text, one for data
Problems with this approach:
When compiling/linking program, need to specify base + bounds (fix: make all mem refs relative)
Fragmentation
growing/shrinking makes problems ( to solve: use pages)
Page # (20 bits) |
Page offset ( 12 bits) |
Problem with this layout: the entire page file requires 4 MB of space, with lots of gaps w/ empty data
Fix: use two-level page table (pages = 4 kb)
Works like inodes, direct, indirect2 blocks
Master table (10 bits) |
Intermediate table (10 bits) |
Data index (12 bits) |
Software implementation of 2-level page table
size_t pmap(size_t va){
int offset = va & (1<<12) - 1;
int lo = va >> 12 & (1<<10) – 1;
int hi = va >> 22;
size_t *l0page = PAGETABLE[hi];
if(l0page == FAULT)
return FAULT;
size_t *l1page = PAGETABLE[lo];
if(*l1page == FAULT)
return FAULT;
return *l1page + offset;
}
(PAGETABLE is register %cr3, a privileged register)
When the hardware faults due to a bad page table entry, kernel can:
kill off process
send a signal to the process: SIGSEGV (segmentation fault)
arrange for invalid access to become a valid one (by swapping the necessary page in)