Lecture 14: Virtual Memory

by Wilsen Kosasih

To continue from the last lecture, a bug was found with gzip that causes the lost of files during execution
Below is the code used by gzip

fd = open("foo.gz", O_CREAT,...);
write(...);
close(...);
unlink("foo");
...
...

The bug happens if a crash occurs right after unlinking foo while not all the files have been written.
A proposed solution is shown below

dfd = open("foo.gz",O_CREAT,...);
fd = opeat("dfd", "foo.gz", 'O_CREATE|...');
write(...);
close(...);
unlink("foo");
...
...

The second line of the code allow gzip to write the files before unlinking the source files, fixing the bug.
To do this, --synchronous option is introduced to allow the fix. However, this leads to controversies:
1) The fix will hurt performance (300x slower)
2) The default program will be unsafe

To avoid these problems, we can either
1) Make a filesystem that works
2) Use journaling

Suppose that in the filesystem, we would like to prioritize reliability, followed by performance. The are two key ideas for the implementation:
1) Commit record
a) assume individual sectors can be written atomically
b) Collect Several writes beffor committing
2) Journaling

JOURNALING
--------------------

There are two types of Journaling

1) Write-ahead log
Begin
-Log intended changes
Commit
-Copy change from log to cell data
End

Properties:
-Efficient for a lot of small writes
-Takes too long for big action
-Fast recovery

2) Write-behind log
Begin
-Log olf values
Commit
End

Properties:
-Fast recovery
-Fewer sectors to write
-Wastes effort reading old data from disk

*** A memory-complex idea ***
1) Cascading abort
-Caller should fail if the callee does not commit
2) Compensating actions
-Caller fixes the problem in some other way ans still keeps going
-If commit fails, compensate
3) Recovery Phase
-Recovery should be robust in the presence of crashing
-one way: make it independent
recover(recover(x)) = recover(x)

Question: Suppose you have a mixture of data, can we partition the data into, 1, Must be persistent, vs 2, Ok if we lose it?
Answer: Yes, use multiple file systems on the same machine

Next Problem:
-Unreliable problems with invalid memory references
Solutions:
-Hire perfect programmers
-Use Java/Javascript/Python/Js
-Hardware Help

Simple idea for hardware help
-Add two registers: base(b) + bounds(c)
b <= a <= c, where a is access address
When we do context switch, we change the base and bounds
PROBLEM: Fragmentation & Inflexibility
-Altough it work wells in batch environment, it does not in dynamic
SOLUTION: Pages

Virtual Memory
---------------------

Uses for virtual memory
1) Run programs that need 16 GiB on 8 GiB RAM computers
2) Programs can share memory safely
E.g. for system call
-sendmsg(dest,message)

malloc(293617000)
implemented via a call to mmap
-Application ask the kernel to modify its page table
arguments
-offset in size
-size of region
-virtual memory address