Lecture 11: Disk Scheduling and File Systems

Thursday, May 9, 2013

By: Brandyn Jutovsky and Joshua Dykstra

Disk Scheduling Methods

First Come First Served (FCFS)

Complete requests in the order they arrive. Calculation of the distance traveled by the head (seek time) depends on position of the head before access.

If the head resets each time before accessing a block:

|h-i|

Gives the distance between the location of the disk head and the location of the ith block we are trying to access
Assuming random access where head can be in any location:

0| h i |1

The distance to be traveled depends on random location of head and next block to be accessed. The average seek time is calculated as follows:

Shortest Seek Time First (SSTF)

Complete requests depending upon which requests have the shortest seek time.

Maximizes throughput because jobs requiring least amount of travel for the head are completed first - minimizing latency.
Starvation can occur, however, because requests that require the head to be moved to a track a long distance away to access a block will sit in the queue.

Elevator Algorithm

Let us suppose there is an elevator that only goes in one direction, up or down, and stops only to pick individuals up or let them off while maintain the same direction. Apply this to disk scheduling by only handling requests for blocks that are in same direction of travel as head. Like SSTF, but only in the head's current direction of travel

Still not as fair as FCFS, but all tasks will be handled when disk head travels to one end of the disk and then back (no starvation).
Best performance is limited to blocks located on tracks close to the center of the platter, because they are passed by the head much more often (just like middle floors of building with an elevator).

Circular Elevator

Scheduling system where head only handles requests while traveling in one direction. (like an elevator that only goes up).

Improves fairness of elevator algorithm as blocks on all tracks are serviced equally.
Slower because head must be moved back to start each time (rewind cost).

Anticipatory Scheduling

Applying "dallying" (waiting for multiple similar requests before acting) to disk arm movement. Motivating Example:

Given two processes A and B which request the following blocks:

A: 10, 11, 12, 13, 14
B: 500, 501, 502, 503, 504

Request	Head Position	Head Travel Distance
	0	0
A: 10	10	10
B: 500	500	490
A: 11	11	489
B: 501	501	490
...	...	...

Without dallying, the head has a high amount of seek time because of the travel required in servicing requests from A and B in alternating order. Using dallying, we receive a request, but then dally ~2ms waiting for another request from process A for a block that is near the first request. This way we can service the two requests from A first, before sending the head off to process B's request; limiting travel time and latency of I/Os.

Flash Memory (Solid State Drives) in Disk Scheduling

More expensive (10x - 20x depending on capacity of drive.
Faster - No seek delay.
Flash memory wears out after repeated writing to same location. This is countered by "wear-leveling" performed by controller - drive moves a block of memory around, so that repeated writes occur in different locations and wearing is equal across drive.
Must erase things first before writing. This is a bulk operation so write sequentially.

In short - Our file systems will probably be adjusted to work on flash based disks in the next 5 years.

File Systems

An on disk (or SSD) data structure that provides an abstraction of a set of files.

"Paul Eggert File System" (1974)

Idea: Imagine disk as a giant array:

File Allocation Table (FAT) System (1970)

Boot Sector

Super Block

File Allocation Table (FAT)

Data

Super block contains metadata about file system (eg. size, version, root directory, etc.)
Data is stored in 4 KiB blocks

What's in the FAT

Array of integers where each entry corresponds to a block number on the disk. Entries store the following integers:

Value	Meaning
-1	Free Block
0	EOF has been reached
N	Index of next block in file

How it works:

We use the FAT to find where successive parts of the file are, because each entry points to the next one. So we chain to get all the blocks of a file until we reach a block that is EOF. The FAT is cached in memory and is only 1/2000 of the file system's size.

Handling Sub-directories and File Names

Directories are data and are stored in blocks as an array. Directory entries consist of:

File name (11 bits)

File Size in bytes (32 bits)

File's first block location (16 bits)

FAT Pros and Cons
Pros

Subdirectories are now allowed
Size of file is free to grow
No limit in the number of files

Cons

Data is no longer contiguous
Performance issues - fragmentation means a file is spread all over the disk. Defragmenter program can fix this to make file blocks as contiguous as possible (ie FAT[i] = i+1). This is slow and occupies drive completely.
Moving a file between directories wasn't allowed

Moving a file from one directory from another would require:

Erase entry in FAT
Add new entry in FAT

If system crash occurred between steps 1 and 2, file would be lost. What if we did these steps in opposite order:

Add new entry in FAT
Delete old entry

System crash between the two steps would now lead to a duplicate in the FAT table. So, designers decided moving files would NOT be supported.

Berkeley Fast File System (FFS) (~1980)

Inode idea: data structure on disk that describes a file independent of any directory

Owner

Date

Block 0

Block 1

Block 2

Block 3

...

Block 9 (indirect block)

Blocks 0-8 point to blocks of file data
Block 9 points indirect block where an array of block pointers to file data is stored

FFS used 8 KiB blocks and 32bit file pointers. Calculating the size of the largest file permitted:

10 + 2048 + 2048 * 8 KiB = ~ 2x10e24 bytes = 16 MB file size limit using 9 direct blocks and one indirect block.

Solution to file size:
Use indirection further by adding an additional block pointer entry to the inode which points to a tree of indirect blocks.
- This allows for 2048^2 *8KiB = 2x10e32 bytes = 32 GB file sizes.

FFS Pros and Cons
Cons

Files can have holes if table has a NULL where a pointer data/another inode should be.

Pros

Files can be moved between directories

Moving Files between directories

Original Unix - No way to rename files, instead we have we have:
link(old, new) //writes to destination directory from old directory
unlink(old) //removes from old directory

How does OS know when to delete when unlink is called? A reference counter in inode tracks the number of directories pointing to a file. Once this counter is zero, the file can be deleted from the disk.

Unix and BSD: fsck function is run after reboot to mark files with reference count of 0 as free.