CS 111 Lecture 12
Scribe Notes for 5/12/2014
by Andrew Foglia and Anthony Rivera
Today's Lecture:
- File System Implementation
- Levels of a File System
- inodes
- Hard Links
File System Implementation
Let's revisit the paranoid professor. He wants to reliably remove data from the file system but is afraid people may still be able to read it depending on the file system. The professor can try multiple approaches ("file" is the file we are attempting to remove):
$ rm file
- Naive attempt to purely remove the file
- Nobody can see the file but this doesn't mean this method works!
- What if there is an open file descriptor? If so, another process could still read the file's contents
$ rm file
$ reboot
- The professor may attempt to remove the file and then reboot in order to clear the file completely
- This is a drastic solution but the data will still not be safely removed! It will still sit in the file system
- The professor also wants remove to be fast, and reboot does not offer that
- The effects of these commands depend on the professor's file system...
Where is this remove function implemented?
FAT file system
- Take the directory and throw it away
- Take all block entries, walk through a linked list and store -1 in all to free the blocks
- Note: Data has not been erased because we want to maintain some speed
UNIX file system
- Similar to FAT except there are free blocks
- When the file is removed free blocks are no longer in the linked list but they still exist
$ shred file
- Attempt to shred the file and delete its contents permanently
- The shred command overwrites the content of the file (3 times) with random data so it is tough to retrieve
Why random data rather than zeroes?
- Depending on the media device, there could be traces of the old data. This makes the professor more paranoid!
- When data is written onto the hard drive, dish head writes over what it thinks is the center of the sector
- Special devices exist to read the "ghosts" of the old track
- If you know what the new data is (e.g. zeroes) then it makes it easier to focus on and find the old data
- If new track is random, it will be harder to see what the old data was
- Analogy: Think of graffiti. It is much more difficult to see graffiti when it is painted over with other graffiti opposed to graffiti painted over by white paint (may be able to see traces)
Where does the random data come from?
/dev/random
- "Window into the kernel"
- Kernel maintains an entropy pool
- Entropy pool is a system wide random bit buffer
- Pool is filled with "random" external events (e.g. Low order timestamps from typed characters, network packets, etc.) but is a limited resource (can run out)
- Cons: Entropy pool emptys quick, which makes shred run slowly
/dev/urandom
- Similar to /dev/random but its entropy pool never runs out!
- Uses a pseudo random number generator which creates fake random numbers
- Recent Intel chips have created a new instruction called RDRAND
- RDRAND fills register full of random bits obtained using very low level manipulations (e.g. CPU thermal energy and "noise") but can still fail!
- Shred needs lots of random data
- Cons: Fast but still too slow
- Solution: Shred has its own high performance psuedo random number generator that it seeds with /dev/urandom!
US Federal Information Processing System (FPS) Standard on Deleting Data
- Melt the drive (highest level of security)
- Physically shred the drive
- Degauss the drive
- Overwrite the drive with random data (3 times)
- Options 1-3 destroy the drive entirely, 4 does not
- NOTE: These standards are old. For today's technology, it is not clear if this is enough
- Assume we have written enough to overwrite the data - Does shred work?
- - NO! Data can still be recovered in a log based file system
- When you do a write, the actual data is not actually modified
- - The write is posted in a log somewhere and doesn't update actual data until later
- When you do a read, first check the log to see if there are recent changes
- - If so grab from log
- With this system, shred may overwrite something else and NOT the original data
So how do we remove a file and satisfy our paranoid professor?
- Can still use shred, but instead try erasing one file at a time and erasing the whole file system:
- $ su
- $ shred /dev/rst/0
- But there is a catch!
- - Disks are flakey
- - Blocks can be bad on occasion
- - Disk drive manufacturer reserves extra area to recover from a bad block (using a bad block map)
- - For each bad block, the map will tell us where the good version is
- - At the low level, we can recover data from looking at the bad blocks
Summary
- You must be careful about the levels of a file system
- It is important to know how it works all the way down to the device level in order to guarantee performance, security and other properties
- You should know how your file system works at the machine level
Levels of a File System (UNIX)
This is the typical layout of a UNIX file system (from highest to lowest level):
Symbolic Links
File Names
File Name Components
inodes
---- FILE SYSTEMS BOUNDARY ----
Partitions
Blocks (81952 bytes)
Sectors (512 bytes)
What is all this stuff?
Sectors
- Traditionally 512 bytes long
- A unit bigger than one byte for efficiency reasons
- If too big, bus time will be wasted from grabbing sectors off the disk
Blocks
- Most file systems pick a bigger block size than sector size
- Bigger block size gives extra throughput
- Typical value is 1 GB
Partitions
- Take physical drives and divide them into pieces (e.g. an array of blocks)
- Each partition can be treated independently
Above the File System Boundary
- inodes
- - Representation of a file
- - Talks about a file no matter how you get to that file
- File Name Components
- - All parts of the file name except the '/'
- - e.g. user, bin, and grep in /user/bin/grep
- File Names
- - Also known as "pathnames"
- Symbolic Links
- - Operate above the file name
- - Abstract
inodes
inodes are an integral part of any file system. What is in an inode and what isn't?
Free Space in UNIX File System
- Look at all the inodes
- - If a block is not being pointed at then it is free
- - All blocks - {blocks in use} can be an expensive computation
- - Not practical
- Need an efficient way to detect free space and where it is
- - FAT: If entry is -1 then it is free space
- - UNIX: How do we tell...
- A solution for UNIX: BSD File System
- + Fast to allocate storage
- - I/O needed to allocate or free anything
- - Cache in memory may not be same as what is on disk (constant checks are expensive)
- - Want to minimize seek time (average seek time is ~10 ms + rotational latency of ~8 ms)
- Note: When growing a file we want contiguous data which speeds sequential I/O if no other activity
What is NOT in an inode?
- The file name
- The parent directory
- Which processes have the file open
- Bad performance:
- Must go right to disk whenever a file is opened
- Kernel must keep track of which processes have the file open in main memory
- Reboot clears all data anyway (don't want to clear later)
- Security issues
Hard Links
Many arguments are made against the use of hard links. Hard links can complicate the user model. Also, without them it is easy to compute the parent directory file name from the file. They can also cause some loops which could tamper with your system (except this is not possible in UNIX).
Question: How do we avoid hard-link loops?
Answer: We prohibit hard links to directories. No loop can be created since the link cannot point anywhere else.
Why bother with hard links?
- Want to skip renaming because it can be difficult
- Want programs that can do useful work without doing renaming
- UNIX supports hard links, but FAT does not
In the Original UNIX File System
Instead of:
rename("a", "b")
Use:
link("a", "b")
unlink("a")
To see an advantage of the second method, consider the following code:
link("d/a","e/b")
unlink("d/a")
- In line 1, e is modified but does NOT change d
- In line 2, d is modified
- Each system call modifies just one file
- The problem with the "rename" operation is that it modifies two different files at once
- - This causes trouble because a single system call writes to two places on disk at once
- - However the hardware won't do that, it picks one to do first and one to do second
- - Vulnerable to crashes
- The two system calls are less vulnerable as long as writing a section is atomic (despite a crash even)
- This is why we use hard links!
Fun Stuff with Hard Links
$ git clone glibc glibc-copy
- Makes a copy of entire history of every change made
- Fast! Possible due to hard links
- NOTE: Operation must be read only, will not work in write mode
A Typical Directory Entry (ext2)
Say we want to remove a directory entry...
- This can be thought of as "unlinking" a file
- Think of a directory as having varying length directory entries
- We know how big it is
- Unlink the directory by going to the previous directory entry and bump its directory entry link so it has more junk than it used to have
- Create the directory by scanning the entry and looking for a big enough area of "junk" to replace
- Creation and deletion is done relatively fast and can be done by writing to a single block on the disk!
Example Filetypes
- Directory
- Regular file
- Symbolic link
- Device
Looking at the directory entry will give you some information about the file, however the inode also has the above information (which is also cached). Be careful about this cache coherence, where Linux addresses by saying a file cannot change its type.
The lecture was interrupted by a fire alarm, so notice shorter notes than expected.