CS 111 Winter 2012 Lecture 11 Scribe Note: File System Design
Prepared by Tianyuan Qin, Mengyi Zhu, and Xin Nie.
Table of Contents:
Problems in File System Design
Assumes locality of reference which is not necessarily true, may result in waste of memory
Assumes cache coherence which sometimes does not hold
e.g.: We can have to processes p1 and p2, each has its own cache. Supposed that a file foo is cached out on both processes, if p1 writes to foo, p2’s cache may not be updated, and p2 will not have a correct reading on foo.
Solution: have only one cache, and use locking to avoid race condition.
When the process thinks it writes to disk, it actually writes to the cache
Some ways to get around the cache issue:
Turn off dallying, write to disk immediately, very slow.
sync(): flushes entire cache to disk. But this operation is too global, most processes won't need this, hence slows everything down.
fsync(fd): flush only the blocks of the file with the given file descriptor to disk.
fdatasync(fd): only flushes file's data, not meta-data.
Now suppose we are asked to build a 120 Pera-byte (120,000 TB) file system
Say available on market are 600GB disks that are cheap and fast => we need 200,000 of them for our file system.
The problem and issues that we have to deal with:
Various disks must be consistent with each other
Multiple CPUs each have to handle numerous disks
Need a high-throughput, low latency interconnect
As much parallelism as possible
Robustness, e.g. writes, and implementation of functions such as fdatasync()
Cache coherence
Directory |
file1 |
file2 |
(unused) |
file3 |
file 4 |
|
file 5 |
Directory contains an array of directory entries, each with size of, say, 256 bytes.
Directory entry:
In use |
Time stamp |
Starting sector |
Length |
Name |
One way to speed up the file system is caching the entire Directory block in RAM
The widespread file/operating system RT-11 (RT stands for real-time) in the 1970s uses a similar concept.
Fragmentation:
Internal (trivial) fragmentation: some storage wasted due to the need of sector alignments.
External (big deal): unable to allocation a contiguous chunk of space for a big file if the file system contains small files all over the place.
Preallocation required:
Shrinking a file is easy, but expanding a file can cause trouble
Size of Directory block is fixed (set at file system creation time)
FAT (File Allocation Table) File System (1970s) structure:
Boot Sector |
Super Block |
File allocation table |
Data Blocks |
Boot sector: contains bootstrapping programs which the MBR transfers the CPU to.
Superblock: contains meta-data about the file systems, for example, the size of the file system, which version of FAT and etc.
File allocation table: contains an array of entries which contains:
0 means the current block is the last block of the file
-1 means the block is free
The next block number of the file
Data blocks:
The data of a file may not be contiguous in a FAT file system
Fundamental block size 4 KiB blocks, each sector is generally 512 Bytes
File name and extension (8+3 bytes) |
Meta-data about files (several bytes) |
The location of first block (2 bytes) |
Permission (several bytes) |
In FAT, a directory is a file where contents are an array of directory entries
The Directories are found by using the root which is allocated in the superblock
(+) No more external fragmentation.
(+) We do not need to know the file size at the beginning
(-) Blocks may not be allocated continuously
(-) Problem of inconsistency, for example, a FAT entry may point to a block has value -1 while that block is actually in use.
(-) lseek is O(N) because in order to determine the start point, we will have to go through the FAT as a linked list while in RT 11,we can just directly access the file block by calculating the address.
How can we fix this: cache the FAT in RAM
(-) Rename of a file is not allowed in FAT.
Inode idea (1973): devote an on-disk data structure to each file
Advantages of using inodes:
fix the problem of lseek(); it is now O(1) operation
the use of block bitmap encourages contiguous allocation, which would reduce fragmentations
Each file or directory correspond to a single inode number
Boot sector |
Super block |
Block bitmap |
Inode tables |
Data blocks |
Boot sector Super block Block bitmap Inode tables Data blocks
Boot sector, superblock are the same as FAT
Block bitmap : it maps 1 bit per block of data, indicating whether the corresponding block is free or not
Inode: Unique for each file, containing meta information and block pointers to actually data file.
An inode structure:
Size |
|
|
Permission |
||
Timestamps |
||
Link # |
||
10 directed data block pointers |
>> |
80kiB of data |
First indirect block |
>> |
(8 kiB/4) pointers =16MiB data |
Double indirect block |
>> |
(8kib/4) pointers^2 = 32 GiB |