Lecture 11: File system design

Winter 2015 Week 7 Scribe Notes - 02/18/2015

By: Bryan Hoyt, Kevin Wong

Review: Simple disk I/O

Simple performance model for disk I/O With the model of a long connected list of locations for disk, the cost of a read is dominated by seek time. The cost is the absolute value of the difference in locations. (location a – location b)
Finding the Average Read Cost: (Assumes random reads)


Example; Set of read requests: {20, 97, 512, 109321} Current read is at location 96.
Algorithms for Seek Time

SSTF -- “Shortest Seek Time First”

FCFS -- “First Come First Serve”

Hybrid Approach

Elevator Algorithm

Circular Elevator


Anticipatory Scheduling


Multiprogramming



File System Design

A hybrid data structure which partially lives on flash/disk and partially on RAM. (Data structure design that lives on both secondary and RAM storage) You want the capacity and permanence of secondary storage and performance of RAM.
Professor Eggert’s 1974 UG Research Project was to duplicate UNIX on Lockheed. To write a text editor, he needed to write a file system. He created a very simple file system: First, he divided the disk as an array with sectors. All files begin at the beginning of a sector, all parts of the file are contiguous. If files aren’t a factor of 512, allocate an extra sector, and waste the extra bits. Keeping track of files; The first few sectors are filled with table which tells us where files are. (a directory) Say 2048 bytes, an array of three things: file name, start sector, file size in bytes (Used 16 bits for start sector, and file size; so that’s 4 bytes. Limited name at 12 bytes) Put nulls at the end of the name to signify the end of name.


Advantages:


Downsides:


Other “competition” were trying to avoid certain programs such as: External fragmentation, directory limit, user-created directories, app output, file growth Leads to:

FAT File System (late 1970’s) [File Allocation Table = FAT]



Source: http://www.cs.ucla.edu/classes/spring12/cs111/scribe/12a/

The disk is modeled as an array with sectors. Each block is 4 KiB, or 8 sectors.

Files are treated as linked lists of blocks. The next block is not necessarily yours.

The next field is in the FAT. It is a table of next fields in the format of an array of 16 bit numbers.

A file can now be represented in a 16 bit number, as the beginning of a linked list.

File block locations are randomish, so sequential access is slow Removes problem of external fragmentation.


In FAT, a directory is a file with contents of directory entry.

File names are two sections: 8 bytes and 3 bytes (ie. foo.c = foo00000 c00)

Two bytes that tell you the block number of first block A byte for type of file.

At the very beginning of the system is a small area “super block” Contains file system information: version, size, root directory


Downsides:

-Still imposes a limit on files (although not really a problem) 2 to the 12 bytes, with 2 to the 16 bytes.

-Biggest file is 512 megabytes. Current FAT is FAT32. (all 16 are 32)

-Hard to keep track of free space. (In the directory, put -1 so that program knows it’s a free block).


Fixed most of the RT-11 problems, but new problems:

-Sequential IO can be expensive, performance slower (seeky)

[General fix is having a file repair procedure] or defragmentation putting fragments of each file together so they are contiguous, allowing sequential IO faster. But defragmentation is very expensive.

-Possible to corrupt file system (ex. losing power during defragmentation)

-Potential problem when renaming files (foo.c to bar.c is easy so good). But renaming from a/foo.c to b/foo.c is much worse. In first case, if power goes out, file name change won’t be saved (not so bad)

In second, if copy a directory first, lose file completely; if copy b directory first, potentially pointing to a bad location. As a result, it is not allowed to move files between directories.


Unix file system (~1977)



Source: https://sites.google.com/site/drewcsci325spring2013/projects/01---file-system/unix-version-6
Similar to FAT, but attempts to avoid defrag problem and rename file problem.

The fix was an inode; A small fixed size amount of memory with metadata about files. The inode is cached in RAM during use. Otherwise stored in disk.

New file system:

Example.

git clone Doesn’t really copy all files. It just points to the files with links. Linking is fast.

System call link leaves old name, and adds a new name. Originally there was no rename. Instead, just use link then unlink.

link(“a”, “b”)


Potential problem: Circular links. A directory points to itself or even a parent. Thus, cycles are prohibited in Unix. Thus, you cannot make hard links to directories at all. Only to files.
Suppose you do: (rm foo; cat) < foo Foo begins with link count 1. At rm foo, link count goes to 0, thus is free. Fix: you can reclaim a file’s storage if link count is 0 and no one has file open.


Unix file system still hasn’t fixed defrag problem. But Unix has lseek which is faster. Lseek(fd, N, 0) is O(N) in FAT, O(1) in Unix


Berkeley Fast File System variant: At the start of the file system which is a bitmap of free blocks. Each data block has a bit. Size is 1/65000 ratio. Can fit in RAM because it’s small, and can easily find new free blocks. Helps with contiguous allocation. Helps solve defrag problem partially.