File System Robustness:
CS111 Lecture 14

By Christian Nersesyan, David Antonio Garcia, Victoria Bernard

Robustness in File Systems

Berkley Fast File System (BSD)

Ways to evaluate a file system

Say you are the product manager of an operating system. How do you go about choosing a file system for your OS? How can you make sure the FS you choose is robust? We can test invariants. Invariants in this case are boolean expresssions that should always be true for the FS.

Invariants for a FS

1. Each block in FS in used for only one purpose
   Ex. of purposes: superblock, free bitmap, inodes, etc.
2. All reference blocks are initialized to data that are appropriate to that block.
   Never leave data in an invalid state. For example, don't initialize an inode with a negative file size.
3. All reference data blocks are marked used in the bitmap.
4. All unreferenced data blocks are marked free in bitmap.

Let's refer to the above invariants as the rules of our FS.

Consequences of breaking these rules

1. Disaster
2. Disaster
2. Disaster
4. We loose space in our FS (unused leak). We are permanently marking a block as unuseable, so we are wasting memory space.

Rules 1-3 tend to have the highest prioty from developers since breaking these rules will corrupt the entire FS. Rule 4, although important, has less devastating effects.

Performance Issues

Problem 1 : Cache Coherence

Definition: cache unintentionally differs from memory
    write(fd, "hello", s );
    read (fd, buffer, s );
Most OS use a process called dallying - certain tasks are prioritized over prior tasks
Suppose fd is a regular file on the disk. What if kernel locks at cache? then data would never be copied back to disk
Why do we care so much?
   - cache may differ from disk causes invalid reads
   - write success notified to user was successful when maybe it wasn't.
   - some side effects may not be saved due to out of order writes
   -
Sources of this problem
- Power Loss
- Disk crash
- Suddend Disconnection
Potential Solutions
1. Maintain a backup power supply in case of power loss to ensure cache always gets copied to disk.
2. Syncing functions
    close(int fd)
    closes the file referred to be fd so that fd will no longer be valid reference to that file
    Note: does not "flush" data buffers so it is not guarenteed that prior writes to fd
   were sucessful
    fsync(int fd)
    "flushes" data by ensuring any cached data for the file specified by fd is copied to the disk device
    (or other permanent storage device); will do so for entire inode and data block
    fdatasync(int fd)
    same as fsync but improves performance by not copying metadata, unless that metadata is
    needed in order to allow a subsequent data retrieval to be correctly handled
    sync(void)
    same as fsync but does so for every file in the cache; everything is written before sync call is
   saved
Example
  if(fsync(fd) != 0) error();
  if(write(fd , buf, 8192) != 8192)error();
  if(fsync(fd) != 0) error();
Above code has to change all 20 data blocks and the inode data, slowing performance significantly
  if(fsync(fd) != 0) error();
  if(write(fd , buf, 8192) != 8192)error();
  if(fdatasync(fd) != 0) error();
Now only the 8192/1024(block size) = 8 blocks have to be copied over

Problem 2 : Renaming

Consider the following situation
    rename("a","b");
    if rename fails and system crashes we cannot assume the file was not renames.
Another situation
  rename("x/a","y/b");
  Must follow steps:
   i. read inode
   ii. read x's data
   iii. read y's data
   iv. write y's inode(linkcount = 2)
   v. write x's new data
   vi. change link count back to 1
  Does not preserve all invarients of file systems still because link count temporoary wrong
  However, this is acceptable because only consequence is if "b" gets deleted later link count will
  forever remain = 1 and forever unused.
Potential Solutions
fsck
  program that will perform repairs on storage leaks; goes through and checkes inodes are in use and   link count is correct and updates bitmap Note: if system crashes we shall assume fsck is idempotent.

Robustness Terminology and Theory

3 main goal of a file system
 1. Durability
   Survive failures in underlying hardware e.g losing power
 2. Atomicity
   Changes either all made or not made at all
 3. Performance
   Throughput and latency

Atomicity

How to implement atomicity atop a device where writes aren't atomic?
Golden Rule of Atomicity
Never overwrite the last copy of your data. A drive often contains 2 copies of a file. There is also a commit record to inform one on which copy to use.
Trying to Implement Atomicity with 2 commit records.
 Ex: 2 Commit Logs
  Start up with the same content:
  A
  A
  Begin to write in commit 1:
  A ?
  A A
  Continue:
  Steps: 0 1 2 3 4
        A ? B B B
        A A A ? B
  Issue: If we were to loose power (reboot) in step 3 where one commit represents A
        and the other commit represents B then we will not know which file is
        garbage and which one is relevant.
Solution:
In order to correctly implement atomicity we need 3 copies of the commit record. You should choose the file that the majority of these commits says is the newest. In some cases you may actually have all 3 disagree (hence the need for 3 copies). This scenario could exist if the computer lost power when copying the value of the first commit to the second commit. The first commit copy could point to file A, the second commit copy would be unsure, and the third commit copy could point to file B. In this case we would choose the first commit and thus file A.
Ex: 3 Commit logs
    Steps: 0 1 2 3 4 5 6
     A ? B B B B B
     A A A ? B B B
     A A A A A ? B
  Fixed Issue: We will always choose the majority and if they all disagree as in step 3,
        we choose the first one (Explained in more detail above in "Solution:").

Lampson-Sturges Assumptions

  1. Storage writes may fail or may corrupt other blocks
  2. A later read can detect the bad block. Thus a computer can avoid reading a block
   with corrupted information. This can help to avoid system crashes/glitches
  3. Storage can spontaneously decay
  4. Erros are rare (~Not everything is going to crash and not all the storage will
   spontaneously decay at the same time. There are no (quick) fixes for this)

2 main methods to create a file system based off these assumptions

Idea #1 Commit Record

  We off course keep the golden rule of atomiticty in mind when implementing this method. Here are the steps involved when you issue a write command.

  1. Write the data to a temporary file (if we loose power at this stage, it is as if the write never happened).
  2. Once all the data has been written to the temp file, atomically write "COMMIT" to the commit record. We now have 2 copies of the data. The one in the temp file is the new data, and the other is the old data. If we loose power after we wrote COMMIT, we can go back and see that we still need to copy the new data to the old.
  3. Replace old data with new. If we loose power here, we start this step all over.
  4. Write "DONE" to commit record, meaning that the write is complete.
Idea #2 Journal (Introduction)

  A journal is separate file that tracks every change of the file system. Any change to the FS first gets logged to the journal. If power is lost while actually writing to to the FS in question, we can go back to the journal and restore the file system. Journals are usually implemented as circular arrays. When the buffer gets close to getting full, the journal entries are commited to the FS and the old journal entries are rewritten with new ones.

References

Fall 2014 Ryosuke Takeuchi scribe notes

http://linux.die.net/man/2/fsync