UCLA CS 111 File System Robustness

by Stephani Alves and Chao Deng

Spring 2012: May 15, 2012 - Lecture 13

File System Robustness

Abstract

In this lecture we will talk about how to build a reliable file system. We want our system to respond well and our files to not get lost. In the file system we have internal and external events and things can go wrong with both of them. In the external events, the user request may be bad and cause problems. In the internal events there could occur disk failures, bug in code and so forth.

Note: The system is built out of components

Abstract
Two Reliability Approaches
What Are the Events That Can Go Wrong
Method 1
Method 2
Lampson-Sturgis assumptions
Building reliable operations

Two Reliability Approaches

1st Approach (Method 1)

In this approach we build reliable components and assume each component work. We don't need to worry about reliability at this level. Here the error checking is in cache through internal checksums.

2nd Approach (Method 2)

In this approach we assume components are unreliable. We add reliability at this level. E.g.: Errors in disk controllers cache, spontaneous errors. Here the error checking code is in the file system (in f.s. system).

Both approaches are commonly used.

What are the bad events that can go wrong?

Externally

Internally

Losing power
Many requests that are overwhelming the system
Remove/updates unintended (I want to do an rm *o, but instead rm * .o) --> on seasnet machine you can do the command "$cd .snapshot" to undo an rm.

Disk error rate overwhelms its internal checksum (disk dies/crashes, bad blocks)
Bit flip in the controller cache
Bit flip in the ram (how often does this happen? In a typical laptop, maybe once a week. It is not that important if it happens on a laptop/desktop computer but it becomes a problem when we are talking about servers)
Disk arm crash
Note: Does the ram have ECC?
- Typical ECC can correct any single bit error but most computers don't have ECC.

In the real world, you will spend most of your time dealing with things that can go wrong making sure that things really work when bad things happens and ultimately what happens when things don't work when they should work. For today we are going to focus in Losing power.

Method 1

One way to address this issue is to attack using method 1 -> build your system out of reliable components.
Component → power supply
Fix: UPS uninterruptable power supply, a marketing term that is incorrect. (no such a thing exists)
Sometimes UPS can be bad although you have power because the UPS can also stop working. It is also prompt for failures.

Method 2

I want a file system that is robust even in the presence of power failures. E.g. No data is lost if you cut the power. After reboot, data is the same it was just before it crashed. From the point of view of the file system that's what we want to have. How do we implement that?

E.g.: A simple file that contains all the grades for cs111 (cs111grades.txt).
For simplicity let's assume: Always exactly 20,000 bytes.

Always write this way:
write(fd, buf, 20000);

Save operation from user's point of view
Harder, because we are implementing this on a BSD unix system
We want it to be atomic
We have problems of consistency. Do we want data to be consistent? Yes!
Data vs. metadata
Data vs. data (different parts of the file)

Assume some extra system calls:
open(" ", O_NOMETADATA) (this flag doesn't exist, just made that up)
write(fd, buf, 8192); 8192 -> block size Even this may not work, depending how your disk controller works. This may actually operate 512 bytes at a time for the worse case

From the point of view of the user? we can't assume writes are not atomic if we want to be able to assume that, we have to do some work. Think about a different way of writing! A simple rule of thumb to get this kind of application to work is the GOLDEN RULE OF ATOMICITY. Golden Rule of Atomicity: Never overwrite your only copy, unless your write operation is known to be atomic.

We are going to change our application

We will maintain 2 copies of the file (main copy and temp copy)

Read -> reads from main copy

Write -> overwrite temp copy

When we finish to write the temp copy we:

main	A	A	A	?	B
temp	?	?	B	B	B
	writing		writing		done

What can go wrong with this approach?

If we crash in the middle, how do we know the temp is the main area? Should we recover from main or from temp? This idea Doesn't work . We don't know weather the old version or new version is right.

How do we solve this problem?

We maintain 3 copies, it turns out it does work.

main	A	A	A	A	A	?	B
temp1	A	?	B	B	B	B	B
temp2	A	A	A	?	B	B	B

Special Case, all three files are different --> and that is the secret! If all three disagree pick temp1!

• If you crash in the middle, you read all three copies and take the majority rule, if 2 of them agree, pick these two and overwrite the third one (2/3 wins) • If all three disagree pick tmp1 → secret

Problem: what if the files are too big?

• It works for small stuff, not really suitable for really large files. Although it works we want something faster.

How can we make it go faster?

Idea: (improving speed, still reliable) have an extra record, (commit record), extra block on disk that keeps track of each of the copies is the real copy. You have MAIN, TMP, 20000 and a third area called the commit record, (really small 512), it tells you rather main or tmp is right to look at.

0= main 1= temp		pre-commit			commit	post-commit		ommit for efficiency
20,000	main	A	A	A	A	?	B	B
20,000	temp	A	?	B	B	B	B	B
512	commit	0	0	0	1	1	1	0

• Assumptions are crucial here.

• Coming up with a reliable system is coming down to the assumptions.

Lampson-Sturgis assumptions

Storage writes may fail
- You can write data out to disk and the write cannot work, there's junk in the disk
- When they fail they may corrupt adjacent pieces of storage
- Typically these blocks are neighboring block
Later reads can detect a badly written sector (bad writes)
Storage can spontaneously decay
- Later reads can still detect it
Errors are rare
- All errors can occur
Individual sector writes are atomic
- when you try to write a sector it will either succeed, or power will drop and it will not succeed.
- Write is always atomic
- Under this kind of assumption the commit idea makes more sense

Pattern for implementing all or nothing operation

The idea is to try to implement this save operation as if it was atomic. Try to implement a high level all or nothing operation atop a system in which some operations are atomic and some are not.This is a standard problem you have with file systems.

General pattern

BEGIN, pre commit actions. Low level operations that may or may not be atomic. While we are doing them the high level operations have not happened yet. We can back out operations, leave no trace of high level operations.

Actions:
BEGIN
//low level operations
• Pre-commit actions
o Back out
o Leave no trace

COMMIT (atomic at the lower level)
• Makes higher level action happen

• Post-commit
o Can't back out, we can't change our mind
o We can improve performance for the next time
//end of low level operations
DONE

ABORT operation, to sub COMMIT

Building reliable operations

IDEA #1 COMMIT RECORD
IDEA #2 JOURNALING (taken from double entry book keeping)

You never erase anything from your ledger

Break our system into 2 pieces

Simple logging protocol: (write ahead og)
Step1. Log into journal our intent to change block #27 to "B"
Append in our journal a record
Step2. Write a commit record to the journal
Step3. Write the changed block to the cell storage
Step4. Clear commit record

If we take this approach we will have something that is robust after a power failure

Recovery Procedure After Power Failure

1. Scan left to right looking for commit records that are not followed by we are not done records
2. Apply one by one

Nice properties of this approach:
Alternate logging protocol (write behind log)

1. Log into the journal the old value of block #27 (old value Is A)
2. Write into cell storage (install B into cell storage)
3. Do a commit