CS 111 Winter 2012 February-27, 2012
Lecture 12: File System Implementation
Compiler: Huwenbo Shi
Table of Content
Bug Description
and Possible Fix
The
bug in the grep
program
will show up if we pass to the program a particular regular
expression and a file that contains a long line.
Consider the
command: grep -r
'^.*x\(\)\1' file.
When
grep
is
executed, it computes the length of a line in the file using the
following code int len =
strlen(“...the long line...x”);.
Suppose
that in file,
there is a line that contains 2^32+2 bytes, then the integer len
in
the code above will have a negative value due to signed integer
overflow.
A malicious user can take advantage of this bug to
access the memory that normally shouldn't be accessible and perform
an attack on the host computer.
One way to fix this bug is to
change the type of len
from
int to
size_t,
this will solve the signed integer overflow problem, but a malicious
user can still perform a Denial of Service (DOS) attack.
Another Form of
Attack – Denial of Service of Attack by Holey File
A
DOS attack is an attack that aims at using up the computing resource
of the host computer such that the host computer cannot perform
normal tasks.
Even
though now we no longer have the signed integer overflow problem, a
malicious user can still perform a DOS attack by passing a huge file
to the grep
program.
A
malicious user can create a huge file without using a lot of disk
space by creating holey files, which have valid inodes, but most of
their actual data block pointers are empty.
The following figure
illustrates how a holey file is stored on the disk.
The
following figure illustrates how holey files look like when viewed
as a sequence of data.
The
following command generates a holey file that's 2147483650 bytes in
size: echo x|dd > holey
bs=1MiB seek=2048.
When
this command is executed, 2048 blocks, each with size 1MiB, are
skipped, and then the actual data is written on the disk, resulting
in a file with 2147483650 bytes.
Ways to Get Rid
of the Denial of Service Attack
i. grep
Is for Text File Only
Since
a holey file contains a lot of zero at the beginning of the file,
it's not a text file.
The grep
program can do
whatever it wants with the non-text file.
For instance, the grep
program can output an
error if the input file is not a text file.
ii.
Skip 0's – Treat Them as New Line Characters
Treating
0's as new line characters allows the host computer to use much less
memory as it does not need to remember them.
However, this
approach still consumes a lot of computing resource, because the
grep
program has to chew
up a lot of these characters.
iii.
Check If A File Is A Holey File First
There
is no system call that check if a file is a holey file.
But we
can use stat(“file”, &st);
to
get the nominal size and physical size of the file, which are stored
in the struct st.
If
the nominal size and the physical size of a file don't agree, then
the file is a holey file.
And if a holey file is passed to the
grep
program,
the grep
program
gives up.
Why We Need
Holey File
The
holey file can be used to create hash tables as we can perform
insertion without allocating a lot of disk space.
Levels of A Unix-Like File System
Symbolic Link |
Files that contain file names |
File Names |
e.g. “a/b”, “foo.c”, “/a///c/d” |
File Name Components |
Names of directory entries e.g. “foo.c”, “usr” |
inodes |
ino_t |
File System |
(See figure below) |
Partitions |
(See figure below) |
Blocks (8192 Bytes) on a disk |
Linear array (in your head) |
Sectors (512 Bytes) on a disk |
Linear array (in your head) |
The
following figure illustrates the relationship between file system
and partitions.
Symlink
File Name Length Limit
Questions:
Can a symlink has a file name that is a TB long?
Answer: No.
There is a limit on the file name length of a symlink.
The
following discuss several ways to store a file name in symlink.
i.
Use A Whole Block to Store the File Name
The
following figure shows how we use a whole block (8192 Bytes) to
store the file name of a symlink.
Inside the inode for a symlink,
there is a pointer to a data block in which we store the file
name.
Since each block has 8192 bytes, the limit on the file name
of the symlink is 8192 bytes long.
So if the symlink content is
small, a lot of storage will be wasted.
ii.
Store the File Name inside the Inode
The
following figure shows how we store the content of the symlink
inside the inode.
Since the inode size is fixed, the file name of
a symlink must be smaller than the size of an inode.
iii.
Symlinks Don't Get Inodes; Symlinks Are Merely Directory
Entries
Instead
of storing the content of a symlink in the inode, we store the
content as directory entry as shown in the table below.
Name |
Inode # |
Flag |
mysl |
/usr/bin/ls (symlink content) |
Symlink |
abc |
27 |
|
This
implementation makes the contents look up faster as we don't need to
find the inode.
The following figure illustrates the Linux –
ext3 v2 directory entry that uses idea above.
There
are two reasons for using both the 16-bit directory entry length and
the 8-bit name length.
The first reason is that by using both of
these two fields, we get a better alignment.
The second reason is
that keeping two fields makes unlink much faster.
The following
figures explains why it's faster.
Before:
After:
Assume
we have two directory entries, directory entry 1 with a entry size
of 104 and directory entry 2 with a entry size of 48, and we want to
unlink directory entry 2.
The simplest way to unlink directory
entry 2 is to increase the directory entry length of directory entry
1 by the directory entry length of directory length 2.
Remove
Data from the Disk
i. Time Complexity Analysis
Using
the implementation discussed above, the time complexity to execute
the command rm -rf directory,
is O(N^2), where N is the number of entries, because the entries are
represented linearly.
To speed up this command, we can use a hash
table to store the directory entries' offset in the directory's
inode, which will improve the performance to O(N).
ii.
Safety Analysis
Question:
After we execute the command rm -rf
*, are
data of files still accessible?
Answer: Yes, the data blocks,
inodes, and meta data are still on the disk.
To wipe the meta
data out, we can use the command >
filename
, but the actual data may still be on the disk.
To wipe the
actual data out, we first output something random into the file, and
then remove that file.
The sequence of commands dd
if=/dev/random of=x
and
rm x
will wipe out the actual data on the disk.
(Note: A faster
version of this command is to use /dev/urandom, although this is not
as randomized as /dev/random, it's enough for wiping out the data
from the disk.)
Question: Why prefer writing random data instead
of writing zeros?
Answer: There are devices that can read
previous traces of data. If we use zero, it's easier read the
previous traces of data. But if we use random value, it will be
harder.
iii. Shred A File
The
program shred
is roughly the same as using urandom, but it uses its own random
data generator, and it does this three times.
The shred
program works on magnetic disks, but doesn't work on solid-state
drives (SSD) because of the wear-level algorithm.
The only way to
wipe out the data on an SSD is to throw it in the fire.
iv.
shred Program
Issues
A)
Bad blocks
A disk has a bad block table that keeps where the bad
blocks get mapped to.
A
malicious user can still get the content of the remapped blocks.
B)
shred
is too slow
Time to shred a 1TB disk takes a long time.
The
solution to this problem is to encrypt the disk.
We
keep a key that specifies how to decrypt the data on the disk, then
to shred the disk is the same as throwing the key away.
Symlink
vs. Hard Link
i. Overview
Consider
the command ln -s d/a e/b,
which creates a symbolic link e/b that contains the file name
“d/a”.
And consider the command ln
d/a e/b,
which create a hard link e/b that points to the file d/a.
The
following figure illustrates the difference of creating a symbolic
link and a hard link.
As
illustrated, when we create a symbolic link, we create an inode
that contains the file name of the file that we want to link
to.
When we create a hard link, we set the inode number to be the
inode of the file that we want to link to.
This will increase the
link count of that file by one.
ii.
Difference #1: Symlinks Can Dangle, Hard Links Can't
If
we remove the file “d/a” (assume that the link count has dropped
to zero), then we arrive at a dangling symlink.
And if we execute
the command cat e/b,
we will get a e/b not found error.
Hard links can't dangle
because before all the hard links are removed, the link count is
always greater than 0, and thus the file can not be freed.
iii.
Difference #2: Symlink Is Relative to the Containing Directory
The
following commands illustrate this point.
ln
-s foo d/x
ln d/x e/y
cat d/x
cat e/y
On
the first line, we create a symbolic link d/x that links to a file
named “foo”.
On the second line, we create a hard link e/y
that links to the symbolic link d/x.
On the third line, we try to
print out the content of the file the symbolic d/x links to. This is
command is equivalent to cat
d/foo.
On
the fourth line, we try to print out the content that e/y links to,
which is the symbolic link d/x.
Since now the containing
directory of the file is e, the command cat
e/y is
equivalent to the command cat
e/foo.
iv.
Difference #3: Symlinks Can Loop
If
we execute the command ln
-s / /usr
we will create a loop.
The following figure illustrates this
loop.
Then
the commands ls
-l /usr/bin
and ls
-l /usr/usr/usr/bin
etc. will be equivalent.
The following commands also create
loops.
ln
-s a b; ln -s b a
ln -s x x
When
we create the symbolic link, the program cannot detect the loop
because it's slow with a time complexity of O(N).
The program can
detect a loop when resolving a symbolic link.
If the program
follows 16 symbolic links and still cannot resolve the symbolic
link, it will give up.
And the function open(“symlink_file”,
…...);
will fail, and errno will be set to ELOOP.
Question: Why can't
hard links loop?
Answer: Since resolving hard links can be slow,
we don't allow hard link loops to be created.
Name
Lookup
For a
file “a/b/c”, we map the name components “a”, “b”, and
“c” to inodes.
If the file name “a/b/c” is given, the
system will assume that the directory a is in the current working
directory, which is stored in the process descriptor.
If the file
name “/a/b/c” is given, the file name knows that the directory a
is in the root directory.
The function chdir(“a/b”);
changes the process's working directory to a/b.
The function
chroot(“a/b”);
changes the process's root directory to a/b, and will result in
“chroot jail”.