CS 111 Scribe Notes

Lecture 12: File System Implementation - Feb 27 2012

By Zhiyang Wang

A Bug of Grep

Problem:
Consider a bug of grep that might cause problem:
grep ".**\(\)\1" file
Using this command, if the file contains a very long line, to say, more than 2^31+2 bytes, then in the code of grep, when executing to the code:
int len = strlen(getline(file));
the integer will be overflowed. Using this overflow bug, hacher could gain the root permission by letting the user in root permission use grep for a file containing very long line.

Solution:
Assume we use size_t and do careful check on the line length. But we still want to attack! So, let's lauch the DOS attack using the inode structure.

Holey files

A file that has no pointer in direct and indirect blocks but one valid block on the end of the indirect2 pointer. (Although its indeed small, but to file system is very very large) How to do that?
$ echo x | dd > holey bs = 1MB seek = 2048
Solution:
1. grep is for txt, it can do whatever it wants in non txt (scan the first several part of the file). For example, skip blank lines (don't put them in the line buffer)
2. detect whether a fiel is holey (by checking zero pointer).but thereտs no syscall for that, how to do it in user mode?
 Stat ("holey", & st)
Using this command, st get the physical size of the file

Tips:
In real life, holey file is used for hashmap.

Level of a Unix-like file system

Implementation of symlink

Option A: Put the symlink information in the data block
Then besides of the link itself, other space in the block is wasted
Option B: Put the symlink information in the direct pointer
That is, putting the contents in the inode which is used to be the data pointer for regular file. Now the link could have a space of 8*12 bytes.
Option C: Symlink doesn't have inode (merely stored in the directory entries)
Pros: could read the content of the symlink directory faster Cons: need add an additional flag That is, putting the contents in the inode which is used to be the data pointer for regular file. Now the link could have a space of 8*12 bytes.
Name Flag (I’m symlink) Contents of the symlink
Name Flag (I’m normal directory) Indoe# (8 bytes)

Dir Entry in Linux ext3 v2

inode 32 bit
directory entry length 16 bit
name length 8 bit
file type 8 bit
file name N bits (indiecated by directory entry length)

Question:
Why there are 2 length field? (directory length & name length)

Hint:
Unlink

Answer:
To unlink a directory entry, the cheapest way is to add the directory entries length to the entry in front of it. (Do this because the length is flexible, therefore hard to manage)

Problem:
  1. If the user keep unlinking, that will waste much space
  2. Removing a directory will be O(N^2)

    Solution:
    Use tree / hashtable to organize directory (rather than linear storage)
  3. Are data of files still accesible after unlink?

  4. Sure, they are all there. To truly delete a file:
    dd if = /dev/urandom of = X
    rm X
    Now the content of X is some random data.
    dd if = /dev/zero of = X
    rm X
    However, rumor that the traces of the data is still easy to read if you overwrite it with simple pattern.

Symbolic links vs Hard links

Symbolic:
ln –s d/a e/b (symlink(“a”,”b”))
a special inode for symbolic link (Why we need that?) as a real pointer

Hard Link:
ln d/a e/b link(“a”,”b”)
Just change the directory entry that point to the same inode (and change the link count)

Difference

  1. Symlinks can dangle. Hard links can’t
  2. Symlink is relative to containing dir
  3. ln –s foo d/x
    ln d/x e/y
    cat d/x = cat d/foo
    cat e/y = cat e/foo
  4. Symlink could loop

  5. Solution:
    open(“a”,); if fails (errno == ELOOP)

Walking through the directory

a/b/c
  1. Assuming we have the inode of “a” (working directory recorded in the process) (chdir(“a/b”) a system call that set the working directory)
  2. Continuing mapping directory name into inode
  3. Issues
    1. a/b/c/ will check whether c is a directorys
    2. a//b/c just ignore the extra slash
    3. /a/b some os set the root as the first inode (by default)