Lecture 11: File System Design - 5/7/14

UCLA CS 111, Spring 2014 - Professor Eggert

Notes by Kailin Chang and Roger Chen

Strategies For Improving File Systems
File System Design

Strategies For Improving File Systems

Batching

To read data by sectors instead of by bytes.

Big blocks: good throughput
Little blocks: good latency

Pre-Fetching (for reads)

To guess what the application will read next, even if the user has not accessed the block on the disk. In general, does more reading than what the current application asks for. This process occurs during runtime. There are 2 main types of guesses:

Sequential Guess: if the application accesses i^th block, it will probably access (i+1)^th block
Random Access Guess: turns off prefetching and clears cache more aggresively

Speculation

Relies on assumptions but improves performance in the long run.

Spatial Locality: reading i^th block means reading (i+1)^th or (i-1)^th block
Temporal Locality: look at accesses over time, read i^th block at time t means you'll read (i+1)^th block at time t + ∂ (small delta)

Dallying (for writes)

Application says write(fd, buf, 1024) and OS returns 1024. Makes if possible for OS to batch or eliminate work entirely. However could cause problem: cache coherence. Cache coherence is demonstrated in the below diagram:

     -------                  -------------------
     | CPU |                  | Disk controller |      
     -------                  -------------------	
        |  Cache D4     													
 ===================================================  								
             |                      
          -------         --------      ---------
          | RAM | D3      | DATA | D1	| CACHE | D2
          -------         --------     	---------

Notice that if D₁ ≠ D₂, power failure occurs.

System Calls

sync(): schedule all cached blocks for writing to disk (i.e. "stop dallying")
fsync(fd): syncs only part of RAM cache that is associated with fd (i.e. ensures fd's blocks are on disk); syncs both data and metadat,so cost is doubled (slow)
fdatasync(fd): data only, used more often

On a Unix file system:

Data: keeps track of actual bytes that you need to write
Metadata: last-modified time

File System Design

What is a file system?

A hybrid data structure:

One part lives on disk: principal part, important for correctness
The other part lives in main memory: needed for performance

A Very Simple File System

Specs: 1MB disk, 16 KB RAM

A general diagram of the overall file system:

 -----------------------------------------------------------------------------------------
 | Array of file names and locations | File data | .......... | File data 2 | .......... | 
 -----------------------------------------------------------------------------------------
 ** File data location is specified by the Offset and Size in the array of file names and locations (see below)

A closer look at the array of file names and locations:

Filename: null terminating
Block offset: used along with "block size" to look up corresponding file data location
Block size: used along with "block offset" to look up corresponding file data location
Note: if a directory entry starts with '\0', it is not used

 ----------------------------------
 |a|b|c| ....junk | Offset | Size | 
 ----------------------------------

Pros:

Simple
Data in a file is always contiguous, so sequential I/O is fast

Cons:

External Fragmentation: Enough free space to hold new file, but not in one block (can be fixed through defragmenters)
Fixed limit on number of files after file system is created
Directory must fit into RAM
Not hierachical
Creator of file must give an upperbound on file size

** Good for realtime file system design because it is sequential and predictable

FAT File System

FAT: File Allocation Table

A general diagram of the overall file system:

 ----------------------------------------------------------
 | Boot Sector | Superblock | FAT (File Allocation Table) | 
 ----------------------------------------------------------

A closer look at each sector:

First sector: boot sector
Second sector: superblock
- Contains meta-information about file system
  ie. What version of filesystem? What % of filesystem is in use?
- Stores block # of root directory
Third sector: FAT (File Allocation Table)
- Tells where all the blocks are for your files
- Array of 16 bit numbers, indexed by block #'s
- Basically a linked list representation of file
- Entries:
  - 0: EOF
  - n > 0: Next block is block n
  - n == -1: unused
Then 4 KiB blocks (2¹² bytes per block)
- 2¹⁶ blocks = 2²⁸ bytes total size

The Directory:

Like a data file, but contents are specially treated by OS
Its contents consist of a series of directory entries:
- Name (11 bytes)
- First block # (2 bytes)
- Size in bytes (4 bytes)
- File type (1 bit): assume only two types - directories, regular files

Pros:

Grow a file as needed, no need to pre-allocate
No external fragmentation

Cons:

Internal fragmentation
Seeks are very slow (linked-list following)
Sequential access can be slow, if on a mature system (defragmenters highly recommended)

UNIX File System

A general diagram of the overall file system:

 --------------------------------------------------------------
 | Boot Block | Superblock | Inode Table | Data Blocks | .... |
 --------------------------------------------------------------

UNIX shares some similarities with the FAT file system, but has a new sector called the Inode Table.

A closer look at the Inode Table:

1 entry per file
1 inode entry:
- Link count: keeps track of the copies to links
- Owner
- Group
- Permissions
- Timestamp
- Size
- 10 blocks that store where the data blocks are located
  - If the file takes up more than 10 data blocks of space, another entry that leads to an array of block #'s (indirect block)
  - Indirect block can hold 2048 entries (block #'s = 32 bits)
  - If this is still not enough, another entry points to an array of blocks that each hold an array of blocks which hold data blocks (doubly indirect block): 2¹¹ × 2¹¹ × 2¹³ = 2³⁵ bytes
- If an entry has block 0: it is unused, you can have 0's intermixed with filled entries

Unix Directory:

Name (14 bytes)
Inode # (2 bytes): you can have entries match to same inode # (hard link)

Pros:

Scales faster
lseek is faster

Cons:

Fixed limit # of files
More internal fragmentation

Lecture 11: File System Design - 5/7/14

UCLA CS 111, Spring 2014 - Professor Eggert

Notes by Kailin Chang and Roger Chen

Table Of Contents

Strategies For Improving File Systems

Batching

Pre-Fetching (for reads)

Speculation

Dallying (for writes)

System Calls

On a Unix file system:

File System Design

What is a file system?

A Very Simple File System

FAT File System

UNIX File System