Miscellaneous File System Topics
For robustness purposes, we want our file system to exhibit certain properties that guarantee the correct operation of file system functions. We call these properties “File System Invariants”. We depend on these invariants to assure that the structure and correctness of our system is always maintained.
If invariants hold before an operation, and then operation is completed, invariants will hold afterwards.
Just like there are consequences to the law, there are consequences to violating our invariants. Lets see what happens when we violate each one.
File System Invariant |
Consequences of Violating Invariant |
1. Every block is used for exactly one purpose – a block is either a super block, a block in the free block bitmap, an inode block, or a data block |
|
2. All referenced blocks are initialized |
|
3. All referenced data blocks are marked as used in the bitmap |
|
4. All unreferenced data blocks are marked free |
|
Lets recall the last lectures’ implementation of rename. It has a problem: if power fails in middle of the operation, link count could become 2 when there is actually only 1 link pointing to it.
This is not a disaster, just a problem. These can be fixed later by fsck – the UNIX file system checker that will run after every reboot.
There are still some problems with this implementation:
Note: Lost and found – fsck puts orphan files here (orphan – inode exists but no directory points to it)
To preserve invariants, we interleave writing data and updating the inode. Ex: Write directory block 1, write inode entry, write directory block 2, write inode entry…
But our disk scheduling algorithms rearrange our careful writes! We could fix this by ignoring the scheduling algorithm and issue writes in order. But this is extremely slow. Or, we devise a system where the lower level is told which writes and reads depend on each other. Ordinary data can usually be done in any order, but metadata depends on each other. The overhead required to track these dependent writes is not too big, and with advances in file system technology it is worth doing despite this overhead costs.
Moving to a more general topic (not just file systems):
In our systems we may want a high level task to inspect low level transactions. Lets say that we have an application in which a high level task needs to know whether an operation it calls has aborted. There are two main methods to handling these aborts:
Cascading aborts – A high level task waits for confirmation that a low level operation has completed successfully. A low level abort causes a high level abort, which can cause yet another higher process to abort, hence the name cascading. This approach is safe, but the high level must wait for confirmation, which makes this slow.
Fast and furious approach – Compensating action approach. The high level task gets a later notification that the lower level transaction failed. This should happen rarely. The high level task can assume the lower level succeeded, and if it didn’t, the program will dial back any changes it made on the wrong assumption. This allows for better performance, but when problem occurs, applications must repair a failure.
Example:
write(fd, buf, 100); //Returns fine
close(fd) == -1;
errno == EIO; //Errors out based on the earlier write call
In this example, the write call immediately returns execution to the program, which continues normally until it reaches the call to close(). At this point, write() notifies us of the error, at which point we have to take care of fixing any now-incorrect actions.
Virtual Memory - Lets stop insisting that segments are allocated contiguously, just like our file system. We then break these base-bounds segments into pages. Then, we map virtual memory into physical pages. Need a magic box (address translation table or Page table)
Patterson, David A., and John L. Hennessy. *Computer Organization and Design: The Hardware/software Interface*. Waltham, MA: Morgan Kaufmann, 2012. Print.
A page table is an array of physical addresses, along with some other bookkeeping bits. Some of the upper bits of the virtual address is used as an index into the array, and the result is the physical address in RAM. It can also indicate whether or not the requested address is in RAM or on disk.
Patterson, David A., and John L. Hennessy. *Computer Organization and Design: The Hardware/software Interface*. Waltham, MA: Morgan Kaufmann, 2012. Print.
C-ish explanation of Hardware:
Physical_page_no = pmap(virtual_page_no);
Int pmap(int vpn)
{
return PAGE_TABLE[vpn];
}
Since the page table itself needs to be in data, it must be in physical memory with a physical address. (It cant have a virtual address because we would need another page table to get the physical address)
Note: Register %cr3 is the page table base register. %cr3 points to physical memory. A page table is 2^22 bytes or 4MB per process.