Jonathan Woong

CS 111 Lecture 14 (February 29, 2016)

----------The GZIP Bug----------

Using the gzip command to compress file(s) may result in data loss.

Example (previous lecture):

$ gzip foo will execute the following code:

1     inputfd = open("foo",O_RDONLY);
2     if (inputfd < 0)
3          { error; }
4     outputfd = open("foo.gz", O_WRONLY|O_CREAT|O_EXECL, 0600);
5     if(output(fd < 0))
6          { error; }
7     read(inputfd, buf, bufsize); // + check for I/O errors
8     write(outputfd, buf, bufsize); // + check for I/O errors
9     if(close(inputfd) != 0)
10          { error; }
11     if(close(outputfd) != 0)
12          { error; }
13     if(unlink("foo") != 0)
14          { error; }
      // if the program crashes here, foo would be lost!
15     exit(0);

Solution (synchronous):

// error checking omitted
1     dfd = open(".", O_SEARCH);
2     fd = openat(dfd, "foo.gz", O_WRONLY|O_CREAT|O_EXECL, 0600);
3     fsync(fd); // save write (slow)
4     fdatasync(dfd); // save file creation (slow)
5     close(fd);
6     unlinkat(dfd, d"foo"); // unlink within contents of dfd

Use:

$ gzip --synchronous [arg]

Implementing the synchronous flag has two implications:

----------Filesystem Reliability----------

Two methods: commit record and journal

1. Commit Record: a tentative method of ensuring write jobs are done in completion (assumes individual sectors can be written atomically)

writes to sectors 1 2 3 ... n (final write) n+1 (commit record)

Writes to the sectors occur sequentially with "n" being the final write.
The "n+1" write is a commit record that indicates all of the write jobs have been completed.
If the write jobs cannot be completed (e.g. power out) before the commit record is written, the entire batch is discarded.
Reboot: If the system is rebooted after the commit record is written, the system will re-write from the beginning of the commit record to the end of the commit record.

2. Journal: memory is divded into a circular buffer (log) and an array of data blocks (cell memory)

log start write 1 write 2 write 3 ... write n commit log end

cell data data block 1 data block 2 data block 3 ... data block n

Journal Methods: I. WAL: a method that tracks intended changes to data (in the log) and executes them after commission (in cell memory)

Pre-commit: intended writes are appended to a log that tracks which data blocks to write to/from, the log has a start and end point
Post-commit: once the commit is reached, the system will execute the writes (to/from cell memory) specified in the log
Recovery/Reboot: the system must scan backwards (from log end) for the oldest completed log entry, then from that point begin re-executing until log end is reached

II. WBL: a method that reads old values (from the log) and reverts data (in cell memory) to the log specifications

Pre-commit: old log values are stored into the log
Post-commit: any discrepancies between the log and cell memory are resolved by updating the cell memory to match the log
Recovery/Reboot: the system does one scan backwards and updates cell data accordingly

III. Log-only: a method that treats completed log entries as completed writes, but requires periodic garbage collection when the log end wraps and meets the log start

Main-memory database (variant): all data and log entries kept in main memory (RAM) for speed and persistency

IV. Partitioning: separating data that must be persistent from data that is acceptable to lose

Example: Ext4 filesystem (journaling, fast read/writes, no fragmentation) vs. VFAT (windows compatibility)

How namei distinguishes between two filesystems: refer to an array of filesystem-inode pairs (dev_t filesystem, ino_t inodenumber) to ensure correct directory searching

How to Speed Up Journaling: store log (sequentially read) and cell memory on separate disks

----------Identifying/Fixing Unreliable Programs----------

How to prevent invalid memory references:

Hardware Method: use two kernel-level hardware registers as boundaries; every access to memory must fall between them

base (start of program) accessible memory bounds (end of program)

Context switching: requires switching the base and bounds registers

Solution to fragmentation (x86-64): segmentation

Divide 32-bit memory addresses into an 8-bit segment number and 24-bit offset within segment

memory address: 8-bit segment number (0-255) 24-bit segment offset

Segment number-offset pairs are stored in a segment table

segment number base bounds
0 address address
1 address address
2 address address
... ... ...
255 address address

Two types of segment tables: stack (for stack data) and text (for code)

----------Virtual Memory----------

Per-process paged based memory addressing that only requires a base address

Divide 32-bit virtual addresses into a 22-bit virtual page number (VPN) and 10-bit virtual page offset (VPO)

virtual address: 22-bit VPN 10-bit VPO

A 22-bit VPN is sent to VM hardware onboard that translates it into physical page, then the VPO is added to obtain a physical address

Uses for Virtual Memory:

How malloc works: malloc is implemented via mmap

mmap(fdToMmapFrom, offsetIntoFile, sizeOfRegion, VMaddr) allows a process to control its own page table

fdToMmapFrom can point to source /dev/zero to malloc zeroed-out space

Memory Layout

View RAM as cached swap space (split into pages) where page table entries contain pointers to pages in RAM and can be swapped out for pages in the RAM swap space

RAM: swap page 1 swap page 2 swap page 3 ... swap page n program data

page table
page 1
page 2
page 3
page 4
...
page n

On page fault: evict a victim page from page table and replace with a page from the swap space