by Christina Wang
Contents
Very simple file system example
Another file system example : FAT (File Allocation Table)
1. Performance
2. Design/Structure
3. Robustness
But performance and robustness are the 2 main, often-conflicting constraints
A file system can be thought of as an archive of a collection of files, or as a directory, built upon a data structure designed to support this abstraction.
note: following file system graphics are not to scale
A simple, RT-11-like file system (early 70s) could look like
directory |
free space |
File 1 |
File 2 |
free space |
File 4 |
free space |
The directory is of fixed size. One directory entry might look like
file name |
\0 |
\0 |
start |
length |
+ simple
+ no indirection
+ file allocation / deleting files operations are fast
+ fast contiguous access
- fixed size directory
- in big directories, it’s hard to locate free space
- no nested directories
- stores only one copy of things, which is bad in the case of I/O failure
- external fragmentation
- we need to know the file size upon creation
(late 70s)
Boot sector |
Superblock |
FAT |
|
1st |
2nd |
|
3rd |
The superblock stores things like FAT version, size, root directory, and first block start information. The FAT is an array of block numbers. It was originally 16 bit. FAT32 is 32 bit. In FAT file systems, directories are files. For example,
Name |
Size |
1st block number |
FAT entries store the location of the next block. 0 indicates the end of a file, while -1 indicates a free block.
+ no more external fragmentation
- no continuous file access
- It’s possible for the representation of files in your system to be inconsistent (for example, following the next block could lead to a block that reads -1. The block here is marked as free when it’s really in use.
- internal fragmentation
For example, all our blocks could look like this
1 bit in use |
4095 bits wasted |
A: When we’re using applications that often jump into the middle of files, like Databases. Instead, we’d like lseek to be as fast as a real hardware seek. RT-11 supports this, but we’d like our FAT advantages.
One possible fix: We could cache the FAT into RAM, and convert the data as we go into a tree (for example) data structure, but what happens when our disk space is something like 32TB? We could use the…
There are several types. V7 is ~1978. BSD is ~1980.
A BSD like file system might look like
Boot sector |
Superblock |
Block bitmap |
Inode table |
Data (in 8 kiB blocks) |
The block bitmap maps 1 bit per block of data. There is 1 inode per file.
An inode entry stores file information, like so
Size |
||||||
Permissions |
||||||
Timestamps |
||||||
Link count |
||||||
1st data block ptr |
-> |
80 kiB of data |
||||
2nd data block ptr |
-> |
|||||
3rd data block ptr |
-> |
|||||
4th data block ptr |
-> |
|||||
5th data block ptr |
-> |
|||||
6th data block ptr |
-> |
|||||
7th data block ptr |
-> |
|||||
8th data block ptr |
-> |
|||||
9th data block ptr |
-> |
|||||
10th data block ptr |
-> |
|||||
First indirect block |
-> |
(8 kiB/4 bytes) of ptrs |
-> |
16 MiB of data |
||
Double indirect block |
-> |
(8 kiB/4 bytes) of ptrs |
-> |
(8 kiB/4 bytes) of ptrs |
-> |
32 GiB of data |
+ lseek costs are no longer ridiculous
+ files can have holes
Nowadays you can use new system calls
Call fallocate(fd, 100)to reserve the space. Here, the size of the file grows as needed.
Call posix_fallocate(fd, 100) to reserve space. Here, the size of the file does not grow.
Name (14 bytes) |
Inode # (2+) |
We have varying length directory entries here.
when you can have 2 different names for the same file.
The following is okay in UNIX
A |
296 |
C |
296 |
… |
… |
When a hard link is added, the link count, which counts the number of directory entries that point to the file, and is stored in the inode entry increases by one.
1. Possible race condition to create hard links, etc
2. Link count overflow.
3. Open file deletion
What if a file is deleted, but is still open elsewhere?
A: To reclaim storage on disk, the link count must be 0. However, the OS in RAM counts the number of file descriptors open on the file, and that number must also be 0.
What if a power failure or reboot occurs while the file is still open?
A: Orphan storage. To fix this, run a file system checker (fsck) upon reboot. fsck walks through, looking for orphan storage and repairs the file system structure.