CS111 Lecture 12 Scribe Notes (Fall 2014)

by Dennis Chan

CSS styling based on CS 111 Lecture 7 Scribe Notes (Spring 2014) by Yoav Zimmerman and Noah Hadfield-Manell

Special Directory Entries
Directory Hard Links
Symbolic Links
1. Motivation
2. Implementation
File Types
Layers of a Unix File System

Special Directory Entries

Within every directory, there are two special directory entries '.' and '..'. In order to see them, we need to use the -a option of the ls command. The -a option shows all of the files within a directory, including hidden ones. Specifically, these hidden files are files that begin with a '.', also known as dot files (for example, .gitignore).

$ mkdir tmp; cd tmp
$ ls -al
drwxr-xr-x  2 dennisch csgrad 4096 Nov 23 22:28 .
drwx------  10 dennisch csgrad 4096 Nov 23 22:28 ..

The '.' entry is a self-pointer in that it always points to the directory itself, while the '..' entry points to the parent directory. The exception is that the '..' entry for the root directory points to the root directory itself, since there is no parent.

There are two ways to implement '.' and '..' within a file system:

Each directory on disk contains at least the two entries '.' and '..'
- - requires more disk space
- + does not require additional code to interpret
Omit the entries from disk, but have the namei (name-to-inode) function be aware of them
- + does not require additional disk space

In Unix file systems, the second method is used.

Directory Hard Links

Returning to the topic of hard links, we consider the possible dangers of creating hard links to directories. For example, we might have the following situation:

In this case, the file /usr/bin/oops has been created as a hard link to the root directory /. This creates a loop. While data structures in main memory generally have no problem with loops, file systems cannot tolerate them as well. Programs such as find have trouble functioning properly since there is no definite end to a loop.

There are several options to deal with loops so that find can work:

Option 1: Detect loops by internally storing in RAM and checking an array of inodes. Inodes are pushed when descending through the file system and popped when ascending. A duplicate inode indicates a loop, so it should be skipped.
- cost is O(N) for memory and O(NlogN) for time
- memory cost is proportional to the size of the file system, which can be larger than RAM
Option 2: Detect loops by using the two-iterator method. Whenever the two iterators meet, a loop exists.
- incredibly slow (factor of 2)
Option 3: Give find a directory depth limit. find will stop searching after the limit is reached.
- may be incomplete
- produces insecurity in forensics, since the attacker will just place the offending file one further
Option 4: Forbid loops.
- not every OS does this in practice, but Linux/Unix/POSIX systems do

With Option 4, there needs to be a way to prevent loops from being created. One method might be to implement dynamic cycle detection, similar to that which is used to prevent deadlocks. Unfortunately, this method would be far too slow since the system would need to examine many data structures on disk, and detecting long cycles may produce unacceptable delays for the user. Instead, the preferred method of preventing loops is to prohibit any and all hard links to directories (exceptions: '.' and '..').

Symbolic Links

Let us now take a look at the behavior and implementation of the command git clone.

cd src; git clone a b

This command takes an existing Git repository and creates an identical one with the same directories and files. What is amazing about git clone is that it can clone the entire Emacs repository, which has decades of commits and history, in a matter of 1-2 seconds. It can do this partly because of the preservationist nature of Git - most of the history contained in the Git repository is read-only. As such, git clone only needs to create hard links to the history files, rather than copy them outright.

Motivation

Unfortunately, as discussed previously, hard links cannot be made to directories due to the danger of loops. This means that if there were a directory contained within a/.git named old which contained all of the history, it would not be possible to speed up the cloning process by simply hard-linking to that directory.

The solution is to use symbolic links, also known as symlinks. Symbolic links are special files that act like hard links, but do not actually point to the same file. Instead, they only contain the pathname of the file they are linked to. Unlike hard links, symbolic links to directories do not create loops, because find will not follow a symbolic link to a directory. Thus, the following command will work for git clone:

$ ln -s ../../a/.git/old b/.git

A common observation on Linux systems is that there will often be a /bin directory in addition to a /usr/bin directory. Both contain important commands and files, but they are separated for historical reasons. Because they often have similar contents, it is not unusual to sometimes see them symbolically linked.

$ ln -s /usr/bin /bin
$ ls -l /bin
lrwxrwxrwx.  2 root root 69632 Sep 26 08:22 bin -> /usr/bin

Implementation

In the Berkeley Fast File System, in addition to regular file inodes and directory inodes, there are symlink inodes. A symlink inode points to a data block which contains the contents of the symlink.

It would seem as though there were a good deal of wasted space in a symlink data block, as pathnames can be fairly short. Fortunately, this issue can be resolved in two ways:

Store symlinks differently depending on size
- short symlinks can be stored in the inode itself
- long symlinks can use the data blocks
Use two block sizes
- use short blocks of one-fourth regular data block size for short symlinks

Since symlinks are not regular files nor are they directories, the system needs to treat symlinks differently. For example, we make a call to open:

open("b", O_RDONLY, ...);

with the symlink b. What happens is that open will call the namei function, which will recognize the file as a symlink and substitute b with the name of the file it is linked to. If b happens to be linked to some non-existent file nowhere, then open will fail as if it had been passed nowhere as an argument. There is a special case where a symlink will point to itself - that is, the user issues the command

$ ln -s b b

In this case, attempting to open the file will cause namei to substitute b for b and repeat the substitution until the arbitrary limit of 20 is reached, after which it will stop and set errno to ELOOP. To avoid such a loop, we might try to prevent having symlinks be self-referential. This leads to trouble, however, due to the difficulty in detection. Determining whether the command

$ ln -s /a/b/c/d/e e

is a case of a symlink referring to itself would require an expensive algorithm. Thus, the easier option is to just impose a limit on the number of substitutions for namei.

Other system calls need to handle symlinks differently as well. For example, the ls command inherently knows to print the symlink along with its linked file, rather than follow the symlink. It does this by using the system call lstat rather than stat. The key difference with lstat is that it will not follow the last component of the file name, i.e.

struct stats st;
lstat("a/b", &st);

will follow a if it happens to be a symlink, but not b. The system calls unlink and rename operate in the same fashion. This behavior can be seen in open as well by including the O_NOFOLLOW flag. With the O_NOFOLLOW flag set, open will fail for symlinks since it will not follow them to their linked file.

File Types

The Unix file system contains several other file types which we will cover briefly. These file types are denoted by different symbols in the output of ls:

directory d   character special c
regular   -   block special     b
symlink   l   coniguous file    C
fifo      p

The mkfifo command will make a mkfifo system call to create a named pipe in the form of a fifo file. This pipe is equivalent in function to the nameless pipe created by the pipe system call.

The mknod command creates special files which hook into the kernel driver. One of these files is /dev/null, which can be created with the command

mknod /dev/null c 1 3

Layers of a Unix File System

From highest to lowest level, the layers of a Unix file system are as follows (dashed line separates software from hardware):

Symlinks
Filenames
Filename components
Inodes
File systems and mounts
-----------------------
Partitions
Blocks
Sectors

File System Mounting

Within a Unix file system, partitions are "glued" together with mounts. The kernel contains a mount table which maps locations (inodes) to sub-file systems. The mount point becomes the root file of these file systems.

With mounting, there is the possibility that loops can develop. To avoid this problem, duplicates are not allowed in the mount table. The mount table runs into trouble, however, when there are hard links between file systems. For instance, Inode 29 in one file system may represent a directory, but in a different directory Inode 29 will represent a regular file. The solution to this problem is that entries to the mount table are uniquely defined as device-inode pairs.

File systems can be mounted with the command

mount /dev/disk/03 /usr/bin

where the first argument is the partition that identifies the file system and the second argument is the location in the current system. When a new file system is mounted, the files from the old system become invisible. It is often a tactic of system intruders to hide malicious content underneath mounted file systems. For this reason, the mount command requires root access.

Shredding Files

Suppose we wanted to completely remove a file from the file system. The rm command would not work because rm does not actually remove the file. Instead, rm only decrements the link count.

There is another command shred which is more effective. The shred command overwrites the data with random numbers, and it does this several times to remove as much of the magnetic residue from the file data as possible.

Even shred can fall short, however, since some file systems contain redundant data. On Ubuntu, the preferred method is to shred the entire partition. This may still be insufficient since disks contain a list of bad sectors - any data written on the bad sectors would still be recoverable. Therefore, the definitive solution to completely removing a file is to melt the disk.

Virtual File System

To handle the case of multiple different file systems (EXT4, VFAT, FFS), there is a Virtual File System (VFS) layer in the kernel that uses object-oriented programming to represent the file systems. Each file system has its own set of inode and file operations contained within data structures. The VFS layer switches these operations when moving from one file system to another.

CS111 Lecture 12 Scribe Notes (Fall 2014)

Table of Contents