In this lecture we will talk about how to build a reliable file system. We want our system to respond well and our files to not get lost. In the file system we have internal and external events and things can go wrong with both of them. In the external events, the user request may be bad and cause problems. In the internal events there could occur disk failures, bug in code and so forth.
1st Approach (Method 1)
In this approach we build reliable components and assume each component work. We don't need to worry about reliability at this level. Here the error checking is in cache through internal checksums.
2nd Approach (Method 2)
In this approach we assume components are unreliable. We add reliability at this level. E.g.: Errors in disk controllers cache, spontaneous errors. Here the error checking code is in the file system (in f.s. system).
Both approaches are commonly used.
Externally |
Internally |
|
|
In the real world, you will spend most of your time dealing with things that can go wrong making sure that things really work when bad things happens and ultimately what happens when things don't work when they should work. For today we are going to focus in Losing power.
One way to address this issue is to attack using method 1
-> build your system out of reliable components.
Component → power supply
Fix: UPS uninterruptable power supply, a marketing term that is incorrect. (no such a thing exists)
Sometimes UPS can be bad although you have power because the UPS can also stop working. It is also prompt for failures.
I want a file system that is robust even in the presence of power failures. E.g. No data is lost if you cut the power. After reboot, data is the same it was just before it crashed. From the point of view of the file system that's what we want to have. How do we implement that?
From the point of view of the user? we can't assume writes are not atomic if we want to be able to assume that, we have to do some work. Think about a different way of writing! A simple rule of thumb to get this kind of application to work is the GOLDEN RULE OF ATOMICITY. Golden Rule of Atomicity: Never overwrite your only copy, unless your write operation is known to be atomic.
We will maintain 2 copies of the file (main copy and temp copy)
Read -> reads from main copy
Write -> overwrite temp copy
When we finish to write the temp copy we:
main | A | A | A | ? | B |
temp | ? | ? | B | B | B |
writing | writing | done |
If we crash in the middle, how do we know the temp is the main area? Should we recover from main or from temp? This idea Doesn't work . We don't know weather the old version or new version is right.
We maintain 3 copies, it turns out it does work.
main | A | A | A | A | A | ? | B |
temp1 | A | ? | B | B | B | B | B |
temp2 | A | A | A | ? | B | B | B |
Special Case, all three files are different --> and that is the secret! If all three disagree pick temp1!
• If you crash in the middle, you read all three copies and take the majority rule, if 2 of them agree, pick these two and overwrite the third one (2/3 wins)
• If all three disagree pick tmp1 → secret
Idea: (improving speed, still reliable) have an extra record, (commit record), extra block on disk that keeps track of each of the copies is the real copy. You have MAIN, TMP, 20000 and a third area called the commit record, (really small 512), it tells you rather main or tmp is right to look at.
0= main 1= temp |
pre-commit | commit | post-commit | ommit for efficiency | ||||
20,000 | main | A | A | A | A | ? | B | B |
20,000 | temp | A | ? | B | B | B | B | B |
512 | commit | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
• Assumptions are crucial here.
• Coming up with a reliable system is coming down to the assumptions.
BEGIN, pre commit actions. Low level operations that may or may not be atomic. While we are doing them the high level operations have not happened yet.
We can back out operations, leave no trace of high level operations.
Actions:
BEGIN
//low level operations
• Pre-commit actions
o Back out
o Leave no trace
COMMIT (atomic at the lower level)
• Makes higher level action happen
• Post-commit
o Can't back out, we can't change our mind
o We can improve performance for the next time
//end of low level operations
DONE
ABORT operation, to sub COMMIT
IDEA #1 COMMIT RECORD
IDEA #2 JOURNALING (taken from double entry book keeping)
You never erase anything from your ledger
Break our system into 2 pieces