CS 111 Winter 2012 February-27, 2012

Lecture 12: File System Implementation

Compiler: Huwenbo Shi

Table of Content

Aside – Bug in grep Program

File System Implementation

Aside – Bug in grep Program

Bug Description and Possible Fix
The bug in the grep program will show up if we pass to the program a particular regular expression and a file that contains a long line.
Consider the command: grep -r '^.*x\(\)\1' file.
When grep is executed, it computes the length of a line in the file using the following code int len = strlen(“...the long line...x”);.
Suppose that in file, there is a line that contains 2^32+2 bytes, then the integer len in the code above will have a negative value due to signed integer overflow.
A malicious user can take advantage of this bug to access the memory that normally shouldn't be accessible and perform an attack on the host computer.
One way to fix this bug is to change the type of len from int to size_t, this will solve the signed integer overflow problem, but a malicious user can still perform a Denial of Service (DOS) attack.
Another Form of Attack – Denial of Service of Attack by Holey File
A DOS attack is an attack that aims at using up the computing resource of the host computer such that the host computer cannot perform normal tasks.
Even though now we no longer have the signed integer overflow problem, a malicious user can still perform a DOS attack by passing a huge file to the grep program.
A malicious user can create a huge file without using a lot of disk space by creating holey files, which have valid inodes, but most of their actual data block pointers are empty.
The following figure illustrates how a holey file is stored on the disk.

The following figure illustrates how holey files look like when viewed as a sequence of data.

The following command generates a holey file that's 2147483650 bytes in size: echo x|dd > holey bs=1MiB seek=2048.
When this command is executed, 2048 blocks, each with size 1MiB, are skipped, and then the actual data is written on the disk, resulting in a file with 2147483650 bytes.
Ways to Get Rid of the Denial of Service Attack
i. grep Is for Text File Only
Since a holey file contains a lot of zero at the beginning of the file, it's not a text file.
The grep program can do whatever it wants with the non-text file.
For instance, the grep program can output an error if the input file is not a text file.
ii. Skip 0's – Treat Them as New Line Characters
Treating 0's as new line characters allows the host computer to use much less memory as it does not need to remember them.
However, this approach still consumes a lot of computing resource, because the grep program has to chew up a lot of these characters.
iii. Check If A File Is A Holey File First
There is no system call that check if a file is a holey file.
But we can use stat(“file”, &st); to get the nominal size and physical size of the file, which are stored in the struct st.
If the nominal size and the physical size of a file don't agree, then the file is a holey file.
And if a holey file is passed to the grep program, the grep program gives up.
Why We Need Holey File
The holey file can be used to create hash tables as we can perform insertion without allocating a lot of disk space.

File System Implementation

Levels of A Unix-Like File System

Symbolic Link	Files that contain file names
File Names	e.g. “a/b”, “foo.c”, “/a///c/d”
File Name Components	Names of directory entries e.g. “foo.c”, “usr”
inodes	ino_t
File System	(See figure below)
Partitions	(See figure below)
Blocks (8192 Bytes) on a disk	Linear array (in your head)
Sectors (512 Bytes) on a disk	Linear array (in your head)

The following figure illustrates the relationship between file system and partitions.

Symlink File Name Length Limit
Questions: Can a symlink has a file name that is a TB long?
Answer: No. There is a limit on the file name length of a symlink.
The following discuss several ways to store a file name in symlink.
i. Use A Whole Block to Store the File Name
The following figure shows how we use a whole block (8192 Bytes) to store the file name of a symlink.
Inside the inode for a symlink, there is a pointer to a data block in which we store the file name.
Since each block has 8192 bytes, the limit on the file name of the symlink is 8192 bytes long.
So if the symlink content is small, a lot of storage will be wasted.

ii. Store the File Name inside the Inode
The following figure shows how we store the content of the symlink inside the inode.
Since the inode size is fixed, the file name of a symlink must be smaller than the size of an inode.

iii. Symlinks Don't Get Inodes; Symlinks Are Merely Directory Entries
Instead of storing the content of a symlink in the inode, we store the content as directory entry as shown in the table below.

Name	Inode #	Flag
mysl	/usr/bin/ls (symlink content)	Symlink
abc	27

This implementation makes the contents look up faster as we don't need to find the inode.
The following figure illustrates the Linux – ext3 v2 directory entry that uses idea above.

There are two reasons for using both the 16-bit directory entry length and the 8-bit name length.
The first reason is that by using both of these two fields, we get a better alignment.
The second reason is that keeping two fields makes unlink much faster.
The following figures explains why it's faster.
Before:

After:

Assume we have two directory entries, directory entry 1 with a entry size of 104 and directory entry 2 with a entry size of 48, and we want to unlink directory entry 2.
The simplest way to unlink directory entry 2 is to increase the directory entry length of directory entry 1 by the directory entry length of directory length 2.

Remove Data from the Disk
i. Time Complexity Analysis
Using the implementation discussed above, the time complexity to execute the command rm -rf directory, is O(N^2), where N is the number of entries, because the entries are represented linearly.
To speed up this command, we can use a hash table to store the directory entries' offset in the directory's inode, which will improve the performance to O(N).
ii. Safety Analysis
Question: After we execute the command rm -rf *, are data of files still accessible?
Answer: Yes, the data blocks, inodes, and meta data are still on the disk.
To wipe the meta data out, we can use the command > filename , but the actual data may still be on the disk.
To wipe the actual data out, we first output something random into the file, and then remove that file.
The sequence of commands dd if=/dev/random of=x and rm x will wipe out the actual data on the disk.
(Note: A faster version of this command is to use /dev/urandom, although this is not as randomized as /dev/random, it's enough for wiping out the data from the disk.)
Question: Why prefer writing random data instead of writing zeros?
Answer: There are devices that can read previous traces of data. If we use zero, it's easier read the previous traces of data. But if we use random value, it will be harder.
iii. Shred A File
The program shred is roughly the same as using urandom, but it uses its own random data generator, and it does this three times.
The shred program works on magnetic disks, but doesn't work on solid-state drives (SSD) because of the wear-level algorithm.
The only way to wipe out the data on an SSD is to throw it in the fire.
iv. shred Program Issues
A) Bad blocks
A disk has a bad block table that keeps where the bad blocks get mapped to.

A malicious user can still get the content of the remapped blocks.
B) shred is too slow
Time to shred a 1TB disk takes a long time.
The solution to this problem is to encrypt the disk.

We keep a key that specifies how to decrypt the data on the disk, then to shred the disk is the same as throwing the key away.
Symlink vs. Hard Link
i. Overview
Consider the command ln -s d/a e/b, which creates a symbolic link e/b that contains the file name “d/a”.
And consider the command ln d/a e/b, which create a hard link e/b that points to the file d/a.
The following figure illustrates the difference of creating a symbolic link and a hard link.

As illustrated, when we create a symbolic link, we create an inode that contains the file name of the file that we want to link to.
When we create a hard link, we set the inode number to be the inode of the file that we want to link to.
This will increase the link count of that file by one.
ii. Difference #1: Symlinks Can Dangle, Hard Links Can't
If we remove the file “d/a” (assume that the link count has dropped to zero), then we arrive at a dangling symlink.
And if we execute the command cat e/b, we will get a e/b not found error.
Hard links can't dangle because before all the hard links are removed, the link count is always greater than 0, and thus the file can not be freed.
iii. Difference #2: Symlink Is Relative to the Containing Directory
The following commands illustrate this point.
ln -s foo d/x
ln d/x e/y
cat d/x
cat e/y
On the first line, we create a symbolic link d/x that links to a file named “foo”.
On the second line, we create a hard link e/y that links to the symbolic link d/x.
On the third line, we try to print out the content of the file the symbolic d/x links to. This is command is equivalent to cat d/foo.
On the fourth line, we try to print out the content that e/y links to, which is the symbolic link d/x.
Since now the containing directory of the file is e, the command cat e/y is equivalent to the command cat e/foo.
iv. Difference #3: Symlinks Can Loop
If we execute the command ln -s / /usr we will create a loop.
The following figure illustrates this loop.

Then the commands ls -l /usr/bin and ls -l /usr/usr/usr/bin etc. will be equivalent.
The following commands also create loops.
ln -s a b; ln -s b a
ln -s x x
When we create the symbolic link, the program cannot detect the loop because it's slow with a time complexity of O(N).
The program can detect a loop when resolving a symbolic link.
If the program follows 16 symbolic links and still cannot resolve the symbolic link, it will give up.
And the function open(“symlink_file”, …...); will fail, and errno will be set to ELOOP.
Question: Why can't hard links loop?
Answer: Since resolving hard links can be slow, we don't allow hard link loops to be created.
Name Lookup
For a file “a/b/c”, we map the name components “a”, “b”, and “c” to inodes.
If the file name “a/b/c” is given, the system will assume that the directory a is in the current working directory, which is stored in the process descriptor.
If the file name “/a/b/c” is given, the file name knows that the directory a is in the root directory.
The function chdir(“a/b”); changes the process's working directory to a/b.
The function chroot(“a/b”); changes the process's root directory to a/b, and will result in “chroot jail”.

Add Me on Facebook