To continue from the last lecture, a bug was found with gzip
that causes the lost of files during execution
Below is the code used by gzip
fd = open("foo.gz", O_CREAT,...);
write(...);
close(...);
unlink("foo");
...
...
The bug happens if a crash occurs right after unlinking foo while not all
the files have been written.
A proposed solution is shown below
dfd = open("foo.gz",O_CREAT,...);
fd = opeat("dfd", "foo.gz", 'O_CREATE|...');
write(...);
close(...);
unlink("foo");
...
...
The second line of the code allow gzip to write the files before unlinking the source files, fixing the bug.
To do this, --synchronous option is introduced to allow the fix. However, this leads to controversies:
1) The fix will hurt performance (300x slower)
2) The default program will be unsafe
To avoid these problems, we can either
1) Make a filesystem that works
2) Use journaling
Suppose that in the filesystem, we would like to prioritize reliability, followed by performance. The are two key ideas for the implementation:
1) Commit record
a) assume individual sectors can be written atomically
b) Collect Several writes beffor committing
2) Journaling
JOURNALING
--------------------
There are two types of Journaling
1) Write-ahead log
Begin
-Log intended changes
Commit
-Copy change from log to cell data
End
Properties:
-Efficient for a lot of small writes
-Takes too long for big action
-Fast recovery
2) Write-behind log
Begin
-Log olf values
Commit
End
Properties:
-Fast recovery
-Fewer sectors to write
-Wastes effort reading old data from disk
*** A memory-complex idea ***
1) Cascading abort
-Caller should fail if the callee does not commit
2) Compensating actions
-Caller fixes the problem in some other way ans still keeps going
-If commit fails, compensate
3) Recovery Phase
-Recovery should be robust in the presence of crashing
-one way: make it independent
recover(recover(x)) = recover(x)
Question: Suppose you have a mixture of data, can we partition the data into, 1, Must be persistent, vs 2, Ok if we lose it?
Answer: Yes, use multiple file systems on the same machine
Next Problem:
-Unreliable problems with invalid memory references
Solutions:
-Hire perfect programmers
-Use Java/Javascript/Python/Js
-Hardware Help
Simple idea for hardware help
-Add two registers: base(b) + bounds(c)
b <= a <= c, where a is access address
When we do context switch, we change the base and bounds
PROBLEM: Fragmentation & Inflexibility
-Altough it work wells in batch environment, it does not in dynamic
SOLUTION: Pages
Virtual Memory
---------------------
Uses for virtual memory
1) Run programs that need 16 GiB on 8 GiB RAM computers
2) Programs can share memory safely
E.g. for system call
-sendmsg(dest,message)
malloc(293617000)
implemented via a call to mmap
-Application ask the kernel to modify its page table
arguments
-offset in size
-size of region
-virtual memory address