Hongyi Wang (404191739), Yuanjie Li (804126110)

Directory & Inode

Implementing FS with relational DB

Oracle has a file system built atop Oracle DB, that implements all operations atop relational DB. But relational DB is bad compared with actually talking to disk (too slow!)
Result: NoSQL.
File system as an important special case of NoSQL

Go back to inodes

External Representation(to CPU): Data structure that represents a file living on disk.
Internal Representation In-memory data structure that lives in RAMs.
Simplest form: cache - well dally before writing.
More interesting discussions: Who should handle the endianness issues (who know the endinaness)? How to work with 32 v.s. 64 bit kernels, apps?

Go back to directories

dir_entry Linux, ext3v2. We have 32-bit inode number, directory entry=inode#(4bytes)
Directory length includes junk length, but name length doesn’t include junk length
Why do we have junk? Suppose we have another entry after this entry, and now we wanna unlink that entry. All we need to do now is just changing directory length, and mark the unlinked entry as junk. It is really cheap!
Directory length keeps track of our free space, and avoids internal fragmentation

Syscalls for directories

fd = open(“.”, O_RDONLY); // writing to a directory is NOT ALLOWED
read(); // would return -1 for directories
readdir();
unlink();
link();
rename();

Don’t create a large directories with huge number of entries, in which you need O(N) operation to rename/unlink

Levels of Traditional File System

fs_level

Look at the file system in another way: Levels of a traditional file system
sector->block->partitions->inode->directories(special file cannot be written arbitrarily)->symbolic links

Mount Table

How should user program deal with different file systems?

Option 1: make FS visible
E.g., DOS C:\A\B

Option 2 (simpler): FS is normally invisible to applications
How to implement? Mount Table that maps inode number to file system

Each directory has an inode
To mount a file system to this directory, maps this directory’s inode# to the file
system
This approach works recursively

What if inode# in one file system collides with another file system’s inode#?
Solution 1: Always guarantee globally unique inode#
Solution 2: The mount table includes device#, inode# and filesystem. This way, inode# becomes device dependent

device # inode # file system
Driver 1 inode 1 Ext3

We can only link file to the files within this file system
We can create a loop by mounting a directory to the filesystem, which has mounted existing file system. To avoid it, we should trace back to root

What can go wrong?
mount /dev/hdc /home/eggert
/cd /home/eggert/bin
umount /home/eggert # what would happen? Not allowed
Umount busy file system is not allowed!

Suppose we execute
open(“a/b/c”, O_RDONLY)

File name resolution: kernel recursively looks up <device, inode> pair for working directory in the current process entry.
We also need to consolve mount table.
If open(“/a/b/c”,O_RDONLY), we start from root directory rather than current working directory.

Chroot Jail

The chroot syscall changes the root directory. chroot("/home/eggert/junk/");
// change the root directory to /home/eggert/junk/

Chroot has a security issue. On login, two files are checked to verify the user's password: /etc/passwd and /etc/shadow. If the attacker spoofs two alternate files /home/eggert/junk/etc/passwd and /home/eggert/junk/etc/shadow and call chroot to change the root directory. chroot("/home/eggert/junk/");
Then the attacker can leverage the spoofed files to execute commands as sudo (sudo look up in the spoofed files instead of the real /etc/passwd).

Solution: Chroot is a privileged system call!

Symbolic Link

Symbolic Link is a special file that cannot be written arbitrarily, and just maps to a name, NOT inode#
It substitutes the target file name to the real content name.

Syscall and command to create a symbolic link symlink(“/etc/passwd”,”foo”);
ln -s /etc/passwd foo

Issues:

Dangling symbolic: one that maps to a non-exist file

Circular symbolic link
ln -s a b
ln -s b a #we have trouble!
Solution: if we trace more than 20 symbolic files, the system call would fails

Why not just using hard link (map a file to inode#)?
For symbolic link, if we remove the file and recreate it, the symbolic link would still work.
Symbolic can be alias for the directory.

Symlink-aware/unaware syscalls: stat(); // follows symbolic link
lstat(); // does not follow symbolic link
unlink()/rename()/link(); // symlink-aware
we can create hard link for a symbolic link

Valid HTML 4.01 Transitional