Lecture 11: File System Design - 5/7/14
UCLA CS 111, Spring 2014 - Professor Eggert
Notes by Kailin Chang and Roger Chen
Table Of Contents
-
-
Strategies For Improving File Systems
Batching
To read data by sectors instead of by bytes.
- Big blocks: good throughput
- Little blocks: good latency
Pre-Fetching (for reads)
To guess what the application will read next, even if the user has not accessed the block on the disk.
In general, does more reading than what the current application asks for.
This process occurs during runtime.
There are 2 main types of guesses:
- Sequential Guess: if the application accesses i th block, it will probably access (i+1)th block
- Random Access Guess: turns off prefetching and clears cache more aggresively
Speculation
Relies on assumptions but improves performance in the long run.
- Spatial Locality: reading i th block means reading (i+1)th or (i-1)th block
- Temporal Locality: look at accesses over time, read i th block at time t means you'll read (i+1)th block at time t + ∂ (small delta)
Dallying (for writes)
Application says write(fd, buf, 1024) and OS returns 1024. Makes if possible for OS to batch or eliminate work entirely. However could cause problem: cache coherence. Cache coherence is demonstrated in the below diagram:
------- -------------------
| CPU | | Disk controller |
------- -------------------
| Cache D4
===================================================
|
------- -------- ---------
| RAM | D3 | DATA | D1 | CACHE | D2
------- -------- ---------
Notice that if D1 ≠ D2, power failure occurs.
System Calls
- sync(): schedule all cached blocks for writing to disk (i.e. "stop dallying")
- fsync(fd): syncs only part of RAM cache that is associated with fd (i.e. ensures fd's blocks are on disk); syncs both data and metadat,so cost is doubled (slow)
- fdatasync(fd): data only, used more often
On a Unix file system:
- Data: keeps track of actual bytes that you need to write
- Metadata: last-modified time
File System Design
What is a file system?
A hybrid data structure:
- One part lives on disk: principal part, important for correctness
- The other part lives in main memory: needed for performance
A Very Simple File System
Specs: 1MB disk, 16 KB RAM
A general diagram of the overall file system:
-----------------------------------------------------------------------------------------
| Array of file names and locations | File data | .......... | File data 2 | .......... |
-----------------------------------------------------------------------------------------
** File data location is specified by the Offset and Size in the array of file names and locations (see below)
A closer look at the array of file names and locations:
- Filename: null terminating
- Block offset: used along with "block size" to look up corresponding file data location
- Block size: used along with "block offset" to look up corresponding file data location
- Note: if a directory entry starts with '\0', it is not used
----------------------------------
|a|b|c| ....junk | Offset | Size |
----------------------------------
Pros:
- Simple
- Data in a file is always contiguous, so sequential I/O is fast
Cons:
- External Fragmentation: Enough free space to hold new file, but not in one block (can be fixed through defragmenters)
- Fixed limit on number of files after file system is created
- Directory must fit into RAM
- Not hierachical
- Creator of file must give an upperbound on file size
** Good for realtime file system design because it is sequential and predictable
FAT File System
FAT: File Allocation Table
A general diagram of the overall file system:
----------------------------------------------------------
| Boot Sector | Superblock | FAT (File Allocation Table) |
----------------------------------------------------------
A closer look at each sector:
- First sector: boot sector
- Second sector: superblock
- Contains meta-information about file system
ie. What version of filesystem? What % of filesystem is in use?
- Stores block # of root directory
- Third sector: FAT (File Allocation Table)
- Tells where all the blocks are for your files
- Array of 16 bit numbers, indexed by block #'s
- Basically a linked list representation of file
- Entries:
- 0: EOF
- n > 0: Next block is block n
- n == -1: unused
- Then 4 KiB blocks (212 bytes per block)
- 216 blocks = 228 bytes total size
The Directory:
- Like a data file, but contents are specially treated by OS
- Its contents consist of a series of directory entries:
- Name (11 bytes)
- First block # (2 bytes)
- Size in bytes (4 bytes)
- File type (1 bit): assume only two types - directories, regular files
Pros:
- Grow a file as needed, no need to pre-allocate
- No external fragmentation
Cons:
- Internal fragmentation
- Seeks are very slow (linked-list following)
- Sequential access can be slow, if on a mature system (defragmenters highly recommended)
UNIX File System
A general diagram of the overall file system:
--------------------------------------------------------------
| Boot Block | Superblock | Inode Table | Data Blocks | .... |
--------------------------------------------------------------
UNIX shares some similarities with the FAT file system, but has a new sector called the Inode Table.
A closer look at the Inode Table:
- 1 entry per file
- 1 inode entry:
- Link count: keeps track of the copies to links
- Owner
- Group
- Permissions
- Timestamp
- Size
- 10 blocks that store where the data blocks are located
- If the file takes up more than 10 data blocks of space, another entry that leads to an array of block #'s (indirect block)
- Indirect block can hold 2048 entries (block #'s = 32 bits)
- If this is still not enough, another entry points to an array of blocks that each hold an array of blocks which hold data blocks
(doubly indirect block): 211 × 211 × 213 = 235 bytes
- If an entry has block 0: it is unused, you can have 0's intermixed with filled entries
Unix Directory:
- Name (14 bytes)
- Inode # (2 bytes): you can have entries match to same inode # (hard link)
Pros:
- Scales faster
- lseek is faster
Cons:
- Fixed limit # of files
- More internal fragmentation