CS 111 Scribe Notes
Lecture 12: File System Implementation - Feb 27 2012
By Zhiyang WangA Bug of Grep
Problem:Consider a bug of grep that might cause problem:
grep ".**\(\)\1" fileUsing this command, if the file contains a very long line, to say, more than 2^31+2 bytes, then in the code of grep, when executing to the code:
int len = strlen(getline(file));the integer will be overflowed. Using this overflow bug, hacher could gain the root permission by letting the user in root permission use grep for a file containing very long line.
Solution:
Assume we use size_t and do careful check on the line length. But we still want to attack! So, let's lauch the DOS attack using the inode structure.
Holey files
A file that has no pointer in direct and indirect blocks but one valid block on the end of the indirect2 pointer. (Although its indeed small, but to file system is very very large) How to do that?$ echo x | dd > holey bs = 1MB seek = 2048Solution:
1. grep is for txt, it can do whatever it wants in non txt (scan the first several part of the file). For example, skip blank lines (don't put them in the line buffer)
2. detect whether a fiel is holey (by checking zero pointer).but thereտs no syscall for that, how to do it in user mode?
Stat ("holey", & st)Using this command, st get the physical size of the file
Tips:
In real life, holey file is used for hashmap.
Level of a Unix-like file system
Implementation of symlink
Option A: Put the symlink information in the data blockThen besides of the link itself, other space in the block is wasted
Option B: Put the symlink information in the direct pointer
That is, putting the contents in the inode which is used to be the data pointer for regular file. Now the link could have a space of 8*12 bytes.
Option C: Symlink doesn't have inode (merely stored in the directory entries)
Pros: could read the content of the symlink directory faster Cons: need add an additional flag That is, putting the contents in the inode which is used to be the data pointer for regular file. Now the link could have a space of 8*12 bytes.
Name | Flag (I’m symlink) | Contents of the symlink |
Name | Flag (I’m normal directory) | Indoe# (8 bytes) |
Dir Entry in Linux ext3 v2
inode | 32 bit |
directory entry length | 16 bit |
name length | 8 bit |
file type | 8 bit |
file name | N bits (indiecated by directory entry length) |
Question:
Why there are 2 length field? (directory length & name length)
Hint:
Unlink
Answer:
To unlink a directory entry, the cheapest way is to add the directory entries length to the entry in front of it. (Do this because the length is flexible, therefore hard to manage)
Problem:
- If the user keep unlinking, that will waste much space
- Removing a directory will be O(N^2)
Solution:
Use tree / hashtable to organize directory (rather than linear storage) - Are data of files still accesible after unlink?
Sure, they are all there. To truly delete a file:
dd if = /dev/urandom of = XNow the content of X is some random data.
rm X
dd if = /dev/zero of = XHowever, rumor that the traces of the data is still easy to read if you overwrite it with simple pattern.
rm X
Symbolic links vs Hard links
Symbolic:ln –s d/a e/b (symlink(“a”,”b”))a special inode for symbolic link (Why we need that?) as a real pointer
Hard Link:
ln d/a e/b link(“a”,”b”)Just change the directory entry that point to the same inode (and change the link count)
Difference
- Symlinks can dangle. Hard links can’t
- Symlink is relative to containing dir
- Symlink could loop
ln –s foo d/xcat e/y = cat e/foo
ln d/x e/y
cat d/x = cat d/foo
Solution:
open(“a”,); if fails (errno == ELOOP)
Walking through the directory
a/b/c
- Assuming we have the inode of “a” (working directory recorded in the process) (chdir(“a/b”) a system call that set the working directory)
- Continuing mapping directory name into inode
- Issues
- a/b/c/ will check whether c is a directorys
- a//b/c just ignore the extra slash
- /a/b some os set the root as the first inode (by default)