CS111 Lecture 12: File System Implementation Issues (Spring 2014)

Aside: Why not overwrite with zeroes?

In a classic hard drive, the disk arm attempts to write to the center of each track but varies slightly depending on the data written. If a file is overwritten with zeroes, it is possible to read the "ghost" of the previous writes to the track by examining these variations. However, if we overwrite the file with random data, the track will be much harder to read and obtain useful information from.

Analogy: To cover up graffiti, it's better to cover it up with more graffiti than with white paint. With white paint, you'll still be able to see traces of the graffiti under it. With new graffiti, it will be much harder to identify or find traces of the old graffiti.

Aside: Where does the random data come from?

/dev/random: system-wide random buffer, entropy pool, from "random" external events (type a keystroke, move mouse, etc)

Downside: limited resource (waits for random user input if the buffer runs out) so shred will be slow if /dev/random is used

/dev/urandom: magic random entropy pool that never runs dry = LIMITLESS!

Uses a psuedo-random number generator (hopefully you can't tell)

RDRAND: newer Intel chips have this instruction that uses thermal energy/entropy/noise to fill registers with random bits.

...and if you're super paranoid, you can xor the result of RDRAND with /dev/random

Note: shred has its own psuedo-random generator because /dev/urandom is still too slow

Data Deletion Standards

Things that can go wrong

Levels of a Unix File System

More on the Unix File System

Computing Free Space in a Unix File System

Solution 1:
One way to compute this is to compute the number of blocks in use and calculate the difference of {All_Blocks} - {Blocks_In_Use}. However, it is very expensive to compute the number of blocks in use. We need a more efficient way.

Solution 2:
Use a free space table! With a FAT file system, this is easy. We just need to check if an entry in the table is -1. If it is, then the block is free. How is this done for the Unix file system? Use a free block bitmap! (refer to the figure on the right)

A free block bitmap is a storage-efficient way to manage blocks of storage on a disk as it only takes up 1/2^16 of disk space. Each bit in the bitmap corresponds to a block of data. Depending on how the bitmap is implemented, the bit's value will indicate whether the corresponding block is free or in use which makes it very cheap to determine the availability of a block. With the bitmap, it is also very fast to allocate or free storage as it is as simple as flipping the value of the bit. One disadvantage of using a bitmap is that I/O is required to access the free block. For a typical disk, the average seek time is around 10 milliseconds and rotational latency is around 8 milliseconds. This means that a seek takes roughly 18 milliseconds which is quite slow! The best way to improve on this is to reduce fragmentation.

Inodes

What is in an inode?

An inode stores metadata information about a file.

This includes the size of the file, the type of the file (directory, regular, or symbolic link), the number of hard links to the file, the timestamp of the file, any permissions granted to the file, and the block numbers of the data blocks corresponding to the file.

The block numbers are stored in an array inside the inode structure. Typically, an inode can directly store 10 different block numbers (10 slots of the array) which is sufficient for small files. However, for larger files that use up a lot of blocks, more space is required to store their block numbers.

To do this, an additional disk block can be allocated for the sole purpose of holding block numbers. In the inode, the 11th slot of the array will point to the first slot in the newly allocated block. This is called a doubly-indirect block. For even larger files, a triply indirect block can be used.

What is not in an inode?

The fileid is not stored in the inode. Why?

A file with multiple hard links can mean that there are multiple ids to the file. Every file corresponds to exactly one inode, so it wouldn't make sense to store the file id inside the inode.

Instead, the fileid is stored in a directory entry which also stores the inode number of the corresponding file. The parent directory is also not stored in the inode for the same reasoning. Lastly, a record of which processes have the file open is also not stored in an inode. This is because it will lead to bad performance since a process will have to write to disk each time a file is opened, and a reboot will clear all the processes anyway which makes the procedure impractical. Having a record of which processes are using the file also makes the file less secure.

CS111 Lecture 12: File System Implementation Issues

by Gagik Movsisyan, Simon Liang, and Ming Wei Lim | Spring 2014 UCLA

Table of Contents

Securely Deleting a File

Data Deletion Standards

Things that can go wrong

Levels of a Unix File System

More on the Unix File System

Computing Free Space in a Unix File System

Inodes

Hard Links

Arguments against hard links

Why do hard links exist in the first place?

Fun stuff with hard links