File Systems

By Zach Bordofsky, Kevin Wu, and Chris Zhu

File Systems: IBM General Parallel File System (GPFS)

The GPFS is made up of 120 Petabytes of information stored across 200,000 hard drives.

Small drives are used to reduce cost and increase parallelism.

Multiple CPU's will be needed to control this, so that there is not a bottleneck for speed around the CPU. Instead the work is spread across multiple CPUs with access the drives.

CPU and bus setup

Techniques to make this work:

File System Definition:

Abstract organization of data on secondary storage (usually slow, big, persistent).

How abstract it is is quite controversial, as while there is agreement about some commands like read, write, delete, and allocate space, but disagreement about including search.

Paul Eggert's File System

In 1974, Eggert's professor told him, "Paul, I want you to build a file system for my UNIX competitor." At the time he knew nothing about operating systems because there were no OS classes.

Limitations on hardware during the old days:

First he divides the disk into two regions. The first part of the file system would keep track of where all the files were which is essentially a table. Each entry of the table has 3 parts which would be kept track of in three columns.

  1. Name
  2. Starting sector
  3. Size of the file in bytes

Notice that there are different units for the last two parts, sector count and byte count. There is a 2 byte sector count which contains the value of the sector number of the first sector in the file and there is a two byte byte count so we know how many leftover bytes we have.

Now he needs to think how big he wants to make the table. Eggert accomplishes this by scratching his head and randomly picking two sectors to be the size.

Next he makes the name length 8 bytes. If you have shorter than 8 bytes, that’s cool. Just use 0 fills (essentially add a bunch of nullbytes until you reach 8 bytes and then put it into the table). If you have longer than 8 bytes, you lose.

Now we have 12 bytes originally. At the time, Eggert also used 4 secret bytes (which he may have used as a timestamp or for some other reason) so he now has a total of 16 bytes. This results in 2^5 directory entries per sector which means the file system can store 2^6 total files.

~Other Details~

How do I represent a directory entry for a file that's not there? Eggert used all 0's to mean that it’s not in use and the user can use it if they want to make a file. This means a file name has to be at least one byte long. Empty file names are not allowed because the directory entry would be all 0's.

Downsides:

Internal fragmentation occurs because each file is stored into one sector regardless of its size so then there is a lot of wasted space if you store small files. External fragmentation is a major flaw with Eggert's file system which caused him to also tell his professor to throw away his garbage code. Essentially what happens is that pretend you're using the file system for a while. You're allocating files... reading, writing, deleting. The file system grows and shrinks. At some point, you'll have spaces that aren't being used in between files (essentially holes). When you have a lot of these holes, you're wasting a lot of space which can't be used by a bigger file than the holes. Essentially there is no way to "combine the holes" in order to save memory and allocate space for a file. As Eggert said, "External fragmentation is free space you can use but you can't use it for a particular request."

Pros

Real time people care about predictability a lot and sequential access performance is predictable and can thus trump all flaws. They can place an upper bound on the amount of time it took to read the next sector of a file because they know the longest wait if the disk arm is about the right spot.

FAT File System

File System Layout

Representation of a disk layout in a FAT file system

File Allocation Table (F.A.T) - This part of the sector is split into 16 bit blocks. Each block will then point to the next FAT block in the file system. There will be a number in the FAT block that will represent something.

FAT Legend:

IMPORTANT things to note about FAT: Since it is using pointers, the file does not have to be in contiguous blocks of memory. However, this is also inconvenient, because how FAT works is that it needs a pointer to the next block which is 4 bytes. This creates the data block size to be 512 bytes - 4 bytes which is no longer a power of two which makes data alignment problems. FAT still has internal fragmentation - a process does not use all of the data in a given sector and it cannot be used by other processes (waste of data).

How does FAT represent directories?

FAT represents directories as files, albeit distinguished and special files. A directory is an array of directory entries which can be represented like this:

File System Directories

Pros of the FAT file system

Cons of the FAT file system