CS 111 Lecture 12: File System Implementation

Scribes: Troy Tabrilla and Trisha Liao

Lecture Date: 02/27/12

Table of Contents

  1. An Aside: A Bug in Grep
    1. But We Still Want to Attack!
    2. Preventing DOS Attacks
  2. Levels of a Unix-like File System
  3. Issues with Symlinks
  4. An Example File System: Linux ext3 v2
  5. Security Issues
    1. Wiping Data
    2. Shred Issues
  6. Symlinks vs Hard Linkes
    1. Differences
    2. Name Traversal for Symlinks
  7. Miscellaneous Details

An Aside: A Bug in Grep

The command

$ grep –r '^.*x\(\)\1' file

where 'file' contains a line with 231+2 characters would lead to a bug. Inside grep, the following line of code is executed:

int len = strlen("...x");

where "...x" is the string matching the regular expression above with 231+2 characters. len would overflow into a negative number, and attackers could use it to subscript into a "negative" offset of the array. Since an int on a 32-bit machine can only represent at most 231-1, it is too small to contain the length of the string above. One way to fix the bug is to change len from an int to a size_t.

size_t len = strlen("...x");

But We Still Want to Attack!

We can launch a DOS (Denial of Service) attack and force grep to run for an extremely long time. This could easily be done by simply plopping a terabyte file onto the file system. However, this would waste too much disk space.

A smarter way to make grep take longer would be to abuse our knowledge of the file system's implementation and feed it specially prepared files. For example, we could use grep on a holey file, which contains a huge empty area (of zero bytes) and ends in the terminating character that grep is looking for. Representations of holey files are shown in Figures 1 and 2.

Array representation of a holey file.

Figure 1. Array representation of a holey File
Inode representation of a holey file.
Figure 2. Inode representation of a holey File

A holey file can be made with the following command:

$echo x | dd >holey bs=1MiB seek=2048

which skips the first 2048 blocks of the output file and writes out an x. To see how much space the file really takes up, use the -s option:

$ls –ls holey

Preventing DOS Attacks

DOS vulnerabilities can be fixed.

  1. Grep is for text, and it can do whatever it wants with non-text.
    Idea: skip 0’s and treat them as \n’s, so there will be no large buffers.
  2. Grep should detect holey files and give up if the file is too big.
    Idea: use $ls –ls holey and $stat(“holey”, &st) to get the nominal size of the file and the actual size of the file. Grep should give up if the difference between the two numbers exceeds internal fragmentation limits.

Levels of a Unix-like File System

From the highest level to the lowest level, a Unix-like file system is roughly organized as follows:

Issues with Symlinks

Names of symbolic links that are a terabyte long can be difficult to handle. Setting a name length limit could result in lots of internal fragmentation if it is too small, so the length should just be the size of a block. Symbolic links can be implemented in three ways:

  1. An inode can be dedicated to a symbolic link, with the symbolic link’s contents in a data block. However, for a symbolic link of length 8 on a file system with 8192 byte data blocks, 8184 bytes are wasted.
    Symlink type A.
    Figure 5. Symlink Type A

  2. Symbolic link contents can be stored in the inode itself. There is a 96 byte limit (8 bytes x 12 block pointer slots) for the symlink path. A better solution could be to have a bit for switching between using data blocks for symbolic links (option A) or storing the symbolic link’s contents in an inode (option B).
    Symlink type B.
    Figure 6. Symlink Type B

  3. Symbolic links can be implemented as directory entries. This option is faster than the previous options, but it is not used for various reasons. There would be a varying length for directory entries and an extra bit used to account for symbolic links.
    Symlink type C.
    Figure 7. Symlink Type C

An Example File System: Linux ext3 v2

A directory entry in this system would contain the following properties: a 32-bit inode, a 16-bit directory entry length, an 8-bit name length, an 8-bit file types, and an N-byte file name. The space after the file name is considered to be unused.

A Linux directory entry.
Figure 8. A Linux directory entry.

The 16-bit directory entry length and the 8-bit name length help ensure proper unlinking. To unlink a directory entry, the preceding directory entry's length is increased to include the unlinked entry. The unused space after the file name is eventually cleaned up by the operating system. The two lengths help the operating system align the directory entries.
Unlinking a directory entry.
Figure 9. Unlinking a directory entry.

A problem with this file system is that the command “rm –rf” would take O(N2) time to execute. Another directory structure is needed. Hash tables, trees or other structures can be used, each with their own advantages and disadvantages.

For example, implementing a file system with a hashed directory and using the following commands:

$cd /my/file/system
$rm –rf *

would take O(N) time. However, the data of the files are still accessible. When unlinking, data blocks and inodes are untouched, which leads to security issues.

Security Issues

Since the data blocks and inodes are untounched after unlinking, a malicious character can still come around and snoop through your data. We want ways to ensure our data does not fall into the wrong hands.

Wiping Data

Creating new files (e.g. “>f1 >f2 >f3”) would not get rid of the data in inodes. The following are a few ways to delete the data on a disk:

Shred Issues

  1. Bad blocks: Sectors have checksums that let them repair simple errors and remap data to reserved space at the end of the disk before a block becomes bad. Shred doesn’t overwrite those bad blocks, so an attacker could still find data in them.
    Bad blocks a disk.
    Figure 10. Bad blocks on a disk.
  2. Shred is still too slow: Attackers can quickly steal data before shred is completed. So how can we quickly "shred" our disk? Encrypt it. If the contents of the disk need to be deleted immediately, toss the key. Problem solved.

Symlinks vs Hard Links

Figure 11 shows the links created by the above commands.

Example links.
Figure 11. Example links

Differences

Name Traversal for Symlinks

$ln -s a/b/c link would map the path "a/b/c" to an inode with the name "link". Traversing this symlink involves finding a's inode, checking to see if it's a directory, checking if the file b exists in it, finding b's inode, etc. until it finds the file c.

First part of a symlink traversal for 'a/b/c'.
Figure 12. First Part of a Symlink Traversal for 'a/b/c'

Some other examples of symlink paths are:

Miscellaneous Details

chdir(“a/b”) changes the working directory mode. This is the function call version of the shell command $cd a/b.
chroot(“a, “b”) changes the root directory inode. You can’t go back to the original root directory (for the most part). Used to create a "chrooted jail" for sandboxing.

./ and ../ may or may not be inodes. This depends on how the file system is implemented.