Lecture 11 Notes

Sample File System: General Parallel File System (GPFS)

120 PB, 200,000 drives (each ~600 GB) - Chosen to have a lot of smaller drives instead of less big drives because it gives better performance

Striping

Striping includes splitting a file into pieces onto different disks, then using parallelism to access different parts of the file at the same time. This leads to faster throughput.
Distributed Metadata

Metadata - contains info about file, but not the actual file contents (e.g. directories, timestamps).

Having a central metadata in one CPU can lead to bottlenecks as different processes wait to access the central metadata. GPFS removes this bottleneck by distributing the metadata.
Partition Awareness

Network partition

The dotted line acts as a partition. In the above image, the right side of the partition represents a part of a network that is down. A client who is on the left side of the partition wants to be able to continue to operate his or her files, which are maintained on the left side of the partition. With partition awareness, the user can still operate/make progress under the assumption that all needed data is on the client's side of the partition.

A simplified example of an algorithm deciding if the user can still access the file when part of a network is down is: If you're on the big side, you can access. If you're on the small side, you cannot.
Distributed Locking

Locking is needed to prevent different processes from writing over each other's data. The processes need to be able to see if a file is locked or unlocked in a timely manner. The locking information should be distributed so each process can check quickly.
Efficient directory indexing

Say you have many files in a directory. We still want to be able to find/access a file efficiently. Indexing is very important so that we do not have to look through every file to find the one we are looking for.
File system stays live during maintenance

Users still want to be able to access their files even during maintenance.

Eggert's File System (similar to the RT-11)

On machine with 16 KiB RAM, 700 GB disk

Eggert's file system contains a table and the rest of the space is used to store data. The table keeps track of the file with 12 byte entries include the file name, a pointer to where the file's contents are stored, and the size of the file.

Pros:

Simple
Predictable
Sequential access is fast

Cons:

The number of files are limited by code (not true for RT-11)
Internal fragmentation up to 511 (avg: 255) bytes
No permissions
OS needs to know size of file when created (i.e. preallocation required)
External Fragmentation

The biggest flaw is external fragmentation (having holes in your memory/disk space)

This is the file system after numerous reads/writes/allocate data/remove data instructions. The black areas represent currently used storage. After reading, writing, allocating, removing data many times, the file system begins to have many holes where data used to be. Now if a user wants to create a file that is bigger than the size of the holes, he or she can't, even though there is enough total space for the file.

A potential solution is to shift the data once in a while so that the file system's data is contiguous and no longer have these holes, and therefore can utilize the available storage. However, this is expensive to implement.

FAT File system

(1970s)

The FAT file system consists of a boot sector, superblock, next fields, and the data blocks. In the FAT file system, the available data storage is separated into blocks. This removes the problem of external fragmentation. Each block has a next field, which is stored separtely from the data blocks. This next field is used when a file needs more than one block to store its data. The file system keeps track of the multiple blocks by using the next field, which contains a pointer to the next block, until it reaches an EOF. If the next field is 0, that indicates a EOF. If it is 2^16-1, then the block is free for use.

Pros:

No external fragmentation
No preallocation

Cons:

Sequential access can be slow
Defragmentation is slow/trickier
lseek is O(N)
Renaming files can be tricky
$ mv foo.c bar.c is easy. However, $ mv a/foo.c b/foo.c can give us some potential problems because we are writing to two different blocks. If there is a crash sometime in the middle of this (because someone pulled the plug, power outage, etc.), there is the possibility that a/foo.c and b/foo.c both exist. In this case, the program was exited before a/foo.c could be deleted and now we have two unwanted links to the same file.

Lecture 11: File Systems

Sample File System: General Parallel File System (GPFS)

Striping

Distributed Metadata

Partition Awareness

Distributed Locking

Efficient directory indexing

File system stays live during maintenance

Eggert's File System (similar to the RT-11)

FAT File system