Week 7 (Monday) Scribe Notes
Andrew Kuang


File System Implementation Issues:

    Levels of a file system (via application)

    EXAMPLE: Paranoid professor wants to reliably remove data from a system (so that no one else will be able to read it)

    PROBLEM: Although "rm file" exists, other people and/or programs that possess the file descriptor of that deleted file can still read it

> Try:   $rm file
           
$reboot    
NOTE: This is still not safe, the file will still be sitting on there and those with the file descriptor can still access it

    Reality: depending on the type of file system being used, there are possible ways to completely remove the file, but they are complicated (explained below)

    In Linux application, the shred command can be used ("$shred file")

        What does 'shred' do?

            Shred overwrites the content of the file with random data. Random data is used instead of just 0's (the cheaper alternative) because typical track writes (in classic hardware)
            tend to to write to the center of the track; however, this is not guaranteed. As such, the rewrite might be a bit off (writing a bit above or below the center), leaving some
            remnants of pre-existing code known as its "ghost". It is easier to read the ghost of the old track if the new track contains only 0's and, as such, it is easier to determine 
            where the old data was. An analogy provided in class by Eggert was that it would be easier to read graffiti (old code) if it were painted over with just white paint (0's) than if
            more graffiti (random code) was sprayed over it.

        Any problems with shred?

            One of the major problems with shred is in its random overwrite characteristic. When executing shred, random data must be obtained from somewhere and, in this case, the random data
            is obtained from the /dev/random pool. However, if a large shred task is requested or many shred tasks are requested within a small timeframe, the /dev/random pool may be sucked dry
            and shred will proceed at a very slow rate (only writing once new random data has entered the pool). As such, we cannot simply rely on this technique in our paranoid professor system.

        Any solutions?

            Once solution proposed in class was that the /dev/urandom pool should be taken advantange of. Like /dev/random, /dev/urandom contains random data; however, it is based off of the user 
            (hence the additional 'u' in the name). The characteristic that makes /dev/urandom more desirable, however, is that the entropy pool never runs dry

            HOW?

                /dev/urandom uses a pseudo-random number generator (fake random numbers) to create random data to use for its writes. One such example is provided below:

                RDRAND

                    In recent Intel chips (starting from the Haswell generation),
                    a new RDRAND instruction has been added to fill the bufer with random bits from the thermal energy of CPU from noise. With this, if the pool ever runs low of random data, the user 
                    would simply need to move the mouse or open a program or do anything that might cause even the slightest chance in CPU noise.

                    It is important to note that RDRAND can fail. Even at the lower levels of hardward, it is possible that there is not enough thermal energy to produce random bits. If it fails, however,
                    the CPU simply tries again until it succeeds.

                TRUST ISSUES WITH RDRAND

                    Some might argue that pseudo-RNG of RDRAND might not be "random" enough. One possible concern mentioned in class involved the NSA and the possibility that they actually know what random numbers will
                    be generated by RDRAND (although there is no such proof of this currently). As such, there are more measures that can be taken if the user is paranoid enough. For instance, you could 
                    take the RDRAND's output and XOR the results with a buffer pool of truly random numbers (perhaps from /dev/random) to guarantee complete randomness.

        EVEN /dev/urandom ISN'T ENOUGH?

            After all that, it turns out /dev/urandom still isn't fast enough for all "shred" commands. As such, "shred" has its own, higher-performance psuedo-RNG

    Now that we know how to rewrite the data, we should look at when overwriting is actually used opposed to other methods. The US Federal Information Processing Standards (FIPS) has a guideline regarding
    which actions should be taken to delete data. These guidelines are based on priority of data security and can be found here.

   Exceptions to "shred"?

        In the case that an overwrite is necessary, shred is generally called 3 times in hopes that all the desired data will be overwritten with random bits. Even though this might be the case for the situation 
        presented above, "shred" is still not 100% reliable across all systems.

        ASSUME: a log-based file system, where write-requests are stored in a log for future modification instead of real-time modifications

        These systems are desirable due to their increase in performance where one might not want to write something to disk if they are going to change it before it is used, etc. In this case, shredding the data
        would remove the data on the disk; however, the important data might still be on the log (not yet written to disk). 

        What do we do now? Even in cases like these, "shred" can still be used; however, modifications are necessary. Perhaps instead of removing the desired file, we can remove the entire file system. While this may
        seem unecessary to simply get rid of some data, by shredding the whole disk, even log-based file systems canot recover.

        CATCH: disks are flakey

            Disks can have "bad blocks" in them. These blocks might not actually be able to store anything and, when something wants to be stored in them, they will be redirected to another block. In order to keep track
            of these "bad blocks," some disk drive manufacturers will reserve some extra area on disk to recover from the bad blocks (keep a table which re-routes all bad blocks to good blocks). By doing this, the data
            that was originally written the "bad block" will still be stored in a good block and the location will be recorded in the table.

                NOTE: This table which holds the bad blocks grows with them. As disks get older, more blocks become "bad," and more records need to be stored. As such, the table grows to accomodate the increase in records.

            Even after shredding the data, you might be able to look at the bad block table to get the old data.


Building a File System

    When building a file system, you need to know ALL levels (down to the machine level) in order to guarantee properies to your user. You don't want to simply assume that your file system will work in all cases because, if it
    doesn't, you may lose important files/data.

    In UNIX, the levels of the file system are roughly:

 

symbolic links
file names* (aka pathnames)
file name components** (all parts of a file name other than the /'s)
inodes (representations of a particular file)
file systems
BOUNDARY***
partition
blocks
sectors

* File name example: "usr/bin/grep"
** File name components example: "usr" "bin" "grep"
*** Everything listed below the boundary line is common for all file systems (not just UNIX)

    Partitions are formed by taking a physical drive and dividing it into several pieces. Each piece can be represented as an array of blocks. 

    ** While it is important to know what exists within your file system, it is also important to know what doesn't exist on your file system (aka "free space")

    To do this,  you can look at the inodes to find the blocks being pointed to and know that all other blocks are unused. This is similar to the set subtraction:

            { All Blocks } - { Used Blocks } = { Free Blocks }                //This is EXPENSIVE to compute; NOT PRACTICAL


Free Space Tables

    Solution? Use a Free Space Table.

        With a FAT file system, this is easy: simply record a block as free in the table by marking its spot with a -1. To check if a block is free, simply check its FAT.

        With a UNIX file system, it is not as easy. You need to keep track of free/used blocks somehow; however, the method in which this is done is entirely up to the implementer.

        EXAMPLE:

            Some Berkeley guys came up with a modified UNIX file system called the BSD file system. In this file system, a new block was added to keep track of the free blocks.

                [ BOOT | SUPER | FREE-BLOCK BITMAP | INODE TABLES | DATA ]

            This method was fast to allocate storage (simply find enough free blocks); however, it also required I/O to free or allocate a block (needed to check the free block bitmap)

            A possible solution was to store the free block bitmap in cache; however, there are cache concurrency problems that are introduced as a result. In order to guarantee 100% reliability, you would need to constantly
            check to make sure that the cache in memory was same as the data on disk to make sure you weren't accidentally overwriting used blocks or freeing already-empty blocks. This constant check turns out to be too expensive.


Whats in an inode?

    There are several things in an inode:

        1. File size
        2. File type (e.g. directory, regular file, etc)
        3. Number of links
        4. Dates
        5. Permissions
        6. Addresses of data blocks

    While it may seem trivial, it is also important to know whats NOT in an inode:

        1. File's name (this is because of hard links -> some files don't have a single name)
        2. Parent directory which contains the file (this is also because of hard links -> two links to a file may have different directories)
        3. Which processes have the file open
                > Why is this not in an inode?
                    1. Everytime a file is opened, you would need to write to the disk to update the inode (slow performance)
                    2. Security issues (other files should not able to see what else is open because they might take advantage of this information)

    From above, it seems that numbers 1 and 2 can be solved by simply removing hardlinks. In fact, there are several arguments against hard links:

        1. Hardlinks complicate the user model - This is because all applications must know about the hard links to a given file so as to not mess up any pre-existing links
        2. Without hard links, it is easy to compute the parent directory and file name from a file. This information might be useful when looking up files or traversing through directories.
        3. There is a potential problem with hard links in that they could cause loops. In UNIX, however, this isn't the case because hard links are prohibited from linking to directories.

    With the reasons show above, one begins to question why we choose to implement hard links. The main reasoning behind this is that renaming is hard. As such, we want to have programs that can do useful
    work without having to rename. Instead of renaming, we can created a hard link to work with and then unlink after we are finished. For instance, a sample workaround is as follows:

        rename("a", "b");    =    link("a", "b");
                                          unlink("a");

        ^ The advantage of this is that each system call modifies a single file. This way, it is less vulnerable to crashes that may end up ruining data if the crash happens midway (some files modified while others not).

        EXAMPLE:

        link("d/a", "e/b");            only modifies "e"
        unlink("d/a");                only modifies "d"

        rename("d/a", "e/b");        modifies both "d" and "e" => if a crash were to happen after "d" was modified, but not "e" we wouldn't know whether or not "e" was actually modified