File System Design
By: Rong Fan, Xinya Guo, Shwetha Kotekar
There are three topics we want to cover in File Systems:
Performance - we want IO to be fast
Design/structure
Robustness – we want it to work; don’t want to lose data when disk drive fails
The two constraints are performance and robustness.
BUT the faster you make your file system, the less robust it gets. But before we get into that...
WHAT IS A FILE SYSTEM?
A file system is the data structure designed to support the abstraction of the data blocks as an archive and collection of files. In other words, a file system organizes the data blocks into files, directories, and file information. This data structure is unique because it is stored on secondary storage (usually the disk), which is a very slow device.
User
-------------------------Abstraction------------------------
Data Blocks
An Embarrassingly Simple File System:
RT-11 File System
The following simple file system is very similar to the RT-11 files system (Real Time) (early 1970). In this file system, files are contiguous chunks of data. New files are allocated in any free space that will hold it. The deletion of a file frees up that disk space to be used later.
Directory |
File 1 |
File 2 |
File 3 |
Unsued Space |
sequence of sectors: 0 → n
The directory is a fixed size of 10 sectors.
The directory entries are at the beginning of the file. They contain information about the files in the directory. These entries have three fields:
a null terminated string for the file name
the starting block number
number of blocks allocated.
Directory Entry:
File name \0 |
Start of file (starting block number) |
Length in bytes (number of blocks allocated) |
Advantages with RT-11:
It is very simple
There is no indirection, which means we do not need to store the file pointers
Contiguous access is fast
Allocation and freeing files is fast
Disadvantages with RT-11::
External fragmentation can occur; there could be a lot of free space, but it is scattered throughout the disk in smaller chunks. A file allocation may fail because we cannot find a single block of free space large enough to fit it, even though there is enough free net space on disk.
A fixed size directory may lead to unused space if there are few directory entries or not enough space if there are too many directory entries.
The size of the file must be known in advance
There can be no nested directories
If it is a big directory, where is the free space? How do we access it?
And I/O failure = mean very bad news.
FAT (File Allocation Table) File System
(late 1970)
boot sector |
superblock
|
FAT (File Allocation TAble) |
DATA 1 |
DATA 3 |
DATA 2 |
(Each sector is 512 Bytes)
The boot sector contains the bootstrapping programs to start the system upon a boot, as we have discussed in prior lectures. In a system with multiple disks, each disk has the boot sector.
The superblock contains all the metadata about the file system, such as the version number, the size of the entire system, the root directory, the pointer to the first data block, and more.
The file allocation table (FAT) contains the FAT structure, which is a map of the data region. It is described in more detail below.
The rest of the disk are the data blocks to store files.
SO: 232 x 212 = 244
FAT vs. the Simple File structure
Directories are Files. Unlike the simple file structure from before, FAT has no directory block. Instead, directories are stored like files as an array of directory entries.
= 16miB/file limit
FAT16: % disk in FAT = 2/4096 = 0.5%
For example, we want to find “FOO.pdf”. For this, we need to know where the first block. The FAT directory entry will give you the first block. The directory entry is found using the root in the superblock.
FAT Table
The FAT table is an array of entries where each entry represents the corresponding data block. Each entry either contains:
the number of the next block that is part of the file
0 which means the current block is the end of the file, or
-1, which means the block is free.
For example, suppose a file starts at data block 3. The FAT table entry 3 corresponds to data block 3. In this entry is the number 10, which means if we wanted to continue reading the file, we must go to data block 10. If the FAT table, entry 10 has a 0, data block 10 is the last data block in the file.
Issues with FAT:
If data block 3 pointed to data block 10, but the FAT entry for 10 was -1, this means the file points to a free block, which is an error. The file system is inconsistent.
Two directory entries cannot point to the same block as the first block or the file system is inconsistent. The method does now allow for links.
Originally the array was 16 bits. Now dues to increased memory demands, it uses 32 bits.
Advantages with FAT:
No more external fragmentation
File size is no longer fixed. We do not need to know the size of the file at the beginning
Disadvantages with FAT:
Internal fragmentation
With external fragmentation: free space you know about but cant use it
Slower because files are not continuous
Cannot put a database on top of FAT because lseek is very slow. lseek should be as fast as a real hardware seek. This property is true for RT11. To seek, add address of the beginning to the start of file and go to that location. But it is NOT true in FAT: we need to go through a linked list on disk. How to fix this?
FIX: cache the FAT into RAM.
BSD: A UNIX FILE SYSTEM
The BSD (Berkeley Software Distribution) is a UNIX operating system. The UNIX file system is similar to the FAT file system but replaces the FAT with two of its own unique elements. The disk is portioned into the following components:
Boot Sector: like in FAT, contains information for booting and loading the OS.
Super Block: like in FAT, holds metadata about the file system. This includes data like: version number, size, amount of free space, and location of rootdir.
Bitmap Block: Used to locate free and used blocks. Free blocks = 0 and used blocks = 1. This method handles blocks more efficiently because only 1 bit is needed to indicate available blocks. FAT, on the other hand, allocates data corresponding to an array and fragments over time
Inode Table: table to store all the inodes
Data blocks: Contains the actual data
boot sector |
superblock |
inode table |
DATA (8kiB blocks)
|
Inodes
Inode, or "Index node," is a block of fixed size that holds metadata about directory files or regular files. Each inode has the following information:
Type of the file (regular file or directory)
Read/write permissions
Dates/timestamp of last read, write, or change
Number of hard links to file
Array of pointers to blocks to store information about where the blocks are. Each pointer points to a block
Indirect blocks
Double-indirect block
Advantages with UNIX file system:
lseek is no longer ridiculous
Easy to grow a file - Change a value in the block bitmap and add a pointer to the inode
Safe, cheap file renaming and directory copying (still requires more seeks than RT-11)
No external fragmentation
files can have holes!
Api: open a file, create it
lseek ( ..., 10,000,0000,000)
write("abc", 3)
Disadvantages with UNIX file system:
internal fragmentation still possible
FAT fragmentation also still possible