File Systems

File systems are on-disk data structures that present an abstraction of a collection of files. Because disks are too large and files are comparatively too small, this abstraction is necessary.

Contiguous Allocation

The simplest approach: Contiguous Allocation, as used in the RT-11. (1970s)

[0][1][...][Free Space][file1][Free Space][file2][Free Space]
File Allocation TableData

Imagine an arbitrarily large array, divided into sectors of 512 bytes. At the head of the array, you have the file allocation table, and from then on, contiguous sets of data that represent files.

The file allocation table itself is contains three pieces of information on the files: their name, their starting sector, and their size. The operation works like this: to seek a file, find a matching name in the file allocation table, look at its starting sector, and keep note of its size. After looking through size amounts of data, you have reached end of file.

Advantages

+ simple to program
+ the data is sequential, and because FAT fits into RAM, so it reads/seeks very quickly

Disadvantages

- growing files painful if many files are close together on disk
- deleting, then adding, files creates many holes of free space (if the deleted file was larger than the added file taking its place)
    - because those holes can't be used for anything, this is wasted space known as external fragmentation

FAT

The problem of external fragmentation is too serious for large systems, restricting the use of this file system to simple, small ones. Instead of this system, Bill Gates introduced a new one called FAT (File Allocation Table).

[0][1][...][Free Space][file1][Free Space][file2][Free Space]
HeaderFile Allocation TableData, divided into many BLOCKS and denoted by BLOCK NUMBER

Note: This decouples file name from file allocation.

A block is simply a certain amount of sectors (512 bytes), usually 16 sectors (8192 bytes).

What is a directory? A directory is simply a file that contains a table of the names, sizes in bytes, and starting block number.

Files are arranged via the FAT. The FAT is simply an array, where the [block number]-th entry is either the next block number if the file continues, 0 for end of file, and -1 to denote that the [block number]-th block is allocated space.

Advantages

+ No external fragmentation, as files are represented as a chain of blocks that can be anywhere, which can fill any holes.

Disadvantages

- The implementation leads to internal fragmentation, though, which is split up data.
- Room must be made for the FAT table, 1/512 of the disk exactly.
- Because of the implementation, sequential access of files is slow.

UNIX (traditional version)

[Boot Sector][Superblock][Bitmap][Inode Table][Data]

Superblock: Contains the version number, sizes of other regions (superblock, bitmap, etc.)
Bitmap: One bit is assigned for each block of the disk, and is marked when the block is used for data.

Inodes

Index node, fixed-size file descriptor. When used, it is moved to main memory and tells where the data is, as well as metadata (ownership, permissions, dates, etc.), but DOESN'T tell name, directory containing the file.

Implementation of the metadata is easy, as all the data can be fixed-size. Implementation of data pointing is hard. The inode is of fixed size (~80 kB), so if the file the inode corresponds to is too large, then the inode will run out of places to store pointers. So, after the first few blocks of data are pointed to, an indirect block is pointed to, which contains an array of pointers to the data. If that is not enough, a doubly-indirect block is pointed to, which contains an array of pointers to indirect blocks.

Metadata   
First 10 blocks of data   
Indirect Block Pointer -->Indirect Block --> Data Blocks 10-2057 
Doubly-Indirect Block Pointer -->Doubly-Indirect Blocks -->Indirect Blocks -->Data

- Worst case fragmentation: 1 byte per file (tons of wasted space as the entire block is reserved), = (1 + 4)/(8192 + 12 * 4) = .006
If files are really small, they can store the data directly into the inode. Doing so marks one bit in the metadata.

Directories (UNIX v.6, 1975)

- In ancient days, directories were simply files containing a name 14 bytes long, and 2 bytes for inode numbers.

- In Linux ext3 v.2, directories are more dynamic:

Inode #Directory Length Name LengthFile TypeName of File Junk

File Type: the file is of type directory, regular file, etc.

Reading/Writing

- When calling open("a", ... ) or open("a", ... ,WR_ONLY | O_CREATE) the following process occurs:

- For open("/c/b/a", ... ), we recursively search for c, then b then a, where c and b must be directories.

Links

Dirent

[Name][Inode]
"a"127
"b"127

- A hard link is basically a name corresponding the the same inode as the item to be linked to. In this case, the file "b" is a hard link to the file "a"

- However we can get into trouble with hard links. We must not allow cycles, otherwise we can get into an endless loop if we hard link in a fashion like "a/b/c/d" to "a/b/c". For this reason we disallow hard links to directories.

- We can also get into trouble when deleting files which are hard linked to. For this reason, we keep a link count for files, only allowing space used for a file to be freed when the link count is zero. Otherwise we could have a hard link to unallocated space, which is a bad thing.

- A soft link (symlink) is another type of link, which does not have the same restrictions as hard links. A soft link is implemented by making it's file type of a type specific to soft links in the dirent, and having the data for it point to the path of the file to be linked to. When reading that soft link file, the system will notice that it is a soft link type, and substitute the appropriate path contained in the soft link file.

- Soft links have the following properties:

Multiple Filesystems and Links

- Multiple filesystems are implemented in Linux using a VFS - Virtual File System interface. By using object oriented programming techniques, Linux's VFS layer allows different filesystem drivers which implement filesystem-specific functions such as reading/writing to be plugged in. It is possible to use multiple filesystems by mounting them in different places. (/, /home, /var, /opt for example)

- Hard links are not allowed to cross file system boundaries -- why? Since hard links point to inode numbers, and inode numbers are consistent only within the filesystem they are used in, it is not reasonable to allow hard links to point to inodes that are not part of your own filesystem. For this reason, soft links can cross file system boundaries, as a path can potentially point to many different locations on many different filesystems.

- Functionality impacted by filesystems -- rename system call - This is a HARD problem to solve.