CS 111 Scribe Notes

May 22nd, 2012

By: Jun Isaji and Keon Vafai

Miscellaneous File System Topics

File System Invariants

File System Checker (FSCK)

High/Low-level Aborts

Virtual Memory

Why Virtual Memory

Alternatives to VM

Virtual Memory in Hardware

Some Miscellaneous File System Topics

File System Invariants

For robustness purposes, we want our file system to exhibit certain properties that guarantee the correct operation of file system functions. We call these properties “File System Invariants”. We depend on these invariants to assure that the structure and correctness of our system is always maintained.

If invariants hold before an operation, and then operation is completed, invariants will hold afterwards.

Just like there are consequences to the law, there are consequences to violating our invariants. Lets see what happens when we violate each one.

File System Invariant	Consequences of Violating Invariant
1. Every block is used for exactly one purpose – a block is either a super block, a block in the free block bitmap, an inode block, or a data block	Disaster (Data Lost) If a block is used for more than one purpose, when a function writes to a block, it will erase important data that another function depends on. For example, if the file system is using a block as an inode block, and it gets written with data, all of the data of that file’s inode block will be lost.
2. All referenced blocks are initialized	Disaster (Data Lost) If a block is not zeroed out, the file could have garbage values that the file system then depends on and makes a mistake. For example, if we make a new file but don’t write anything to it, its size field could be left uninitialized with garbage in it. Then when the file is written to, the size directs it to a block pointer that is also garbage, but points to a block in use. This then results in data loss.
3. All referenced data blocks are marked as used in the bitmap	Disaster (Data Lost) If a block is referenced but it is not marked in the free block bitmap, it could be used by another inode trying to find a free block. That block could then be overwritten because its not marked as being used, and data is lost.
4. All unreferenced data blocks are marked free	Problem (Storage Leak) If an unreferenced data block is marked as used, it will never be used again. This is not a big problem, unlike the consequences to the other invariants. This results in a data leak, where the system will run out of memory prematurely but no data will be lost

Back to Contents

File System Checker (FSCK)

Lets recall the last lectures’ implementation of rename. It has a problem: if power fails in middle of the operation, link count could become 2 when there is actually only 1 link pointing to it.
This is not a disaster, just a problem. These can be fixed later by fsck – the UNIX file system checker that will run after every reboot.

There are still some problems with this implementation:

If we loose power during fsck, it must be written very carefully to not do anything that corrupts the file system mid operation. It must be idempoten.
User might notice something wrong with the link count
User may not reboot often
fsck takes a long time. However, fsck looks at superblock, bitmap, inodes, directory blocks, indirect blocks, but not at data blocks or free blocks. Since the data and free blocks take up the most space, this becomes less of a problem.

Note: Lost and found – fsck puts orphan files here (orphan – inode exists but no directory points to it)

To preserve invariants, we interleave writing data and updating the inode. Ex: Write directory block 1, write inode entry, write directory block 2, write inode entry…

But our disk scheduling algorithms rearrange our careful writes! We could fix this by ignoring the scheduling algorithm and issue writes in order. But this is extremely slow. Or, we devise a system where the lower level is told which writes and reads depend on each other. Ordinary data can usually be done in any order, but metadata depends on each other. The overhead required to track these dependent writes is not too big, and with advances in file system technology it is worth doing despite this overhead costs.

Back to Contents

High/Low-level Aborts

Moving to a more general topic (not just file systems):

In our systems we may want a high level task to inspect low level transactions. Lets say that we have an application in which a high level task needs to know whether an operation it calls has aborted. There are two main methods to handling these aborts:

Cascading aborts – A high level task waits for confirmation that a low level operation has completed successfully. A low level abort causes a high level abort, which can cause yet another higher process to abort, hence the name cascading. This approach is safe, but the high level must wait for confirmation, which makes this slow.

Fast and furious approach – Compensating action approach. The high level task gets a later notification that the lower level transaction failed. This should happen rarely. The high level task can assume the lower level succeeded, and if it didn’t, the program will dial back any changes it made on the wrong assumption. This allows for better performance, but when problem occurs, applications must repair a failure.

Example:

write(fd, buf, 100); //Returns fine

close(fd) == -1;

errno == EIO; //Errors out based on the earlier write call

In this example, the write call immediately returns execution to the program, which continues normally until it reaches the call to close(). At this point, write() notifies us of the error, at which point we have to take care of fixing any now-incorrect actions.

Back to Contents

Virtual Memory

Why virtual memory?

Protection against unreliable processes (bad pointers) and evil processes
Multiple processes with identical code can have the same virtual addresses and map to different physical addresses
Processes who’s address space exceeds that of physical memory (slow)
Controlled sharing of memory between processes

Back to Contents

Lets focus on protection. What are the alternatives to Virtual Memory?

Hire better programmers!
- Problem: People make mistakes, and some people are evil
Use Java – Bad pointers not allowed. No casts to pointers, and subscripts are always checked
- Problems: runtime checks slow you down
- Sometimes cheats are ok
- You must trust the JVM
Partition physical memory into regions. Each process gets a region. In hardware, 2 new registers are added: base and bounds. Each memory access checks whether it is between base and bounds. Illegal accesses cause a trap
- Pros: really simple, can be done quickly in Hardware, if HW adds a particular base value to the requested memory address, then each process can have same code – this is an example of a primitive address translation. This will allow a process to read a different procedures code.
- Problems:
  - Lose 2 registers
  - Hard to add memory to a process
  - Limited by physical RAM
  - Fragmentation
- In order to fix fragmentation and adding memory to a process, we have multiple base-bounds pairs. Each pair corresponds to a different segment of memory in our program. For example, one segment could be the stack, another could be the text, etc. This is called segmentation. Note: Modifying base and bounds are privileged instructions. Growing a segment is still too hard, since we must rearrange memory sometimes.

Back to Contents

Virtual Memory in Hardware

Virtual Memory - Lets stop insisting that segments are allocated contiguously, just like our file system. We then break these base-bounds segments into pages. Then, we map virtual memory into physical pages. Need a magic box (address translation table or Page table)

Patterson, David A., and John L. Hennessy. *Computer Organization and Design: The Hardware/software Interface*. Waltham, MA: Morgan Kaufmann, 2012. Print.

A page table is an array of physical addresses, along with some other bookkeeping bits. Some of the upper bits of the virtual address is used as an index into the array, and the result is the physical address in RAM. It can also indicate whether or not the requested address is in RAM or on disk.

Patterson, David A., and John L. Hennessy. *Computer Organization and Design: The Hardware/software Interface*. Waltham, MA: Morgan Kaufmann, 2012. Print.

C-ish explanation of Hardware:

Physical_page_no = pmap(virtual_page_no);

Int pmap(int vpn) {

return PAGE_TABLE[vpn];

}

Since the page table itself needs to be in data, it must be in physical memory with a physical address. (It cant have a virtual address because we would need another page table to get the physical address)

Note: Register %cr3 is the page table base register. %cr3 points to physical memory. A page table is 2^22 bytes or 4MB per process.

Back to Contents

CS 111 Scribe Notes

May 22nd, 2012

By: Jun Isaji and Keon Vafai

Table of Contents

Some Miscellaneous File System Topics

File System Invariants

File System Invariant

Consequences of Violating Invariant

File System Checker (FSCK)

High/Low-level Aborts

Virtual Memory

Why virtual memory?

Lets focus on protection. What are the alternatives to Virtual Memory?

Virtual Memory in Hardware