CS 111, W13
Prof. Eggert
Lecture 12
Week 8, Monday
Angela Navarro - 203817385
Bryan Yasukawa - 403775761
Daynan Lai - 003766382
Kevin Fongson - 603785957
Problem with Symbolic Links
Example:
- In emacs, let's say you're editing/etc/passwd
 - While the emacs buffer contents != file contents, emacs creates an extra file (symbolic link) /etc/.#passwdthat points toeggert@penguin.941[host + pid]
 - This is a locking mechanism, and the symbolic link contains info about emacs
 
What can go wrong?
- Non emacs editors
 
- emacs runs stat to check if file's timestamps has changed before rewriting
 
- Emacs crashes & exits before extra lock can be deleted, preventing new emacs from running
 
- workaround: issuekill() system call with the process ID - kill(941, 0)
 
- This workaround results in another issue.PIDs are reused, so by the time you call kill(), that PID could belong to another process
 
- Emacs loops while holding lock
 
- Workaround: emacs can steal the lock. This blows away the symbolic link and creates its own
 
- Suppose/etc/.#passwdalready exists for some other reason
 
- Workaround: if a regular file exists by this name, then we skip all locking and charge ahead. We hope that this is a rare occurrence
 
- Another app removes the lock file (or changes what it points to)
 
- this messes up emacs!
 
- You haven't changed the buffer yet and someone else locks it
 
- when you start editing the file, emacs stats the file again and checks the timestamp (same solution as #1)
 
- File name base after last slash >= 254 bytes (going over 255 byte limit)
 
- Solution: same as #4, you just ignore locking since you can't create the lock file
 
- Different emacs'es on different hosts can interoperate
 
- lnxsrv01.seas.ucla.edu vs lnxsrv03.seas.ucla.edu
 - Windows: doesn't like symbolic links
 
- You can't create symbolic links unless you have a special "Create Symbolic Links" privilege
 - Workaround: use a regular file instead, i.e.
 
cat /etc/.#passwd
eggert@penguin.941
This workaround breaks the solution in#4
Alternatives to using symbolic links for lock files
- system call: fcntl(fd, F_SETLK, ...)
 
- POSIX only, won't work on windows
 - postdates emacs
 - doesn't work with network file systems until NFSv4
 - machine boundaries do not let outside machines kill a machine's own process
 
- just using regular files instead of symlinks
 
- performance:
 


- with symlinks, instead of having data in a separate block you can put data into the inode's contents
 - this optimization is possible if the symlink length < 48 bytes, which avoids an extra seek
 
- getting contents:
 
- regular file: fd = open(...) + read(...) + close(...)
 - symbolic link:symlink(".#file", buf, size)
 - symbolic links use 2 fewer syscalls and are atomic, which means they always get the whole contents of one version of the file
 
Example: Here symlinks are treated as regular files


$ ln -s 'eggert@27' foo
$ ln foo bar
- Above has hard links to the same symlink
 - Two different names for same file
 - Symlinks are read only, cannot change the contents of a symlink
 - You can't change footo change bar
 - Suppose someone wants to open(/foo/bar)
 
- have to look at 4 disk spaces
 - slows down filename resolution
 
Example: Here symlinks are treated as directory entries

Symbolic links are different types of directory entries
- varying amount of space
+ fewer disk accesses
- no hard links b/w symlinks
Consider the following exploit:
Attacker (eggert): 
I know someone will put data into /tmp/foo so...
$ ln -s ~/eggert/data /tmp/foo
Victim:
unmask 077
sort -o /tmp/foo
uniq /tmp/foo
rm /tmp/foo
So now the attacker has a copy of the file but can't read it:
Attacker (eggert): 
I know someone will put data into /tmp/foo so...
$ ln -s ~/eggert/data /tmp/foo
$ touch ~/eggert/data
$ chmod 777 ~/eggert/data
Victim:
unmask 077
sort -o /tmp/foo
uniq /tmp/foo
rm /tmp/foo
Now the attacker can look at the file!
File Name Resolution:
$ open("a/b/c/foo", O_RDONLY)
Steps:
- Get the process' working directory entry D from the process table:
 
[ ~~~~~~~~ | 3961 (working dir) | ~~~~~~~~~~~~~~~]
- Get 1st file name component C
 - Look up C in D's data
 
- if none, fail with errno = ENOENT
 - if its a symbolic link, substitute symlink contents to be the actual path
 
- Now we have inode I
 - if D = I, loop back to 1
 
The system call chdir uses this algorithm above to set the process's current working directory to D.
Problems: 
- Suppose there exists a symlink a/b -> x/y
 - If symlink contents starts with a slash, we have to erase the beginning of the path up until now to get the correct path.
 - What if its a symbolic link loop?
 
- keep a counter of the number of symlinks traversed. the limit is 20, otherwiseerrno = ELOOP
 - this solution is a heuristic designed to improve speed
 
- If the path starts with a slash, the system call chdir("foo") will change the working directory.
 
Sidenote:
#include <unistd.h>
int main (int argc, char** argv) {
chdir(argv[1]);
}
--
$ gcc main.c -o mycd
$ ./mycd /tmp
$ cat foo
This won't work because chdir and chroot can't be called in a program (changes a child process's dir or root, not the parent shell that called it)
Link counts and hard links:

Problems:
- removed a link but forget to decrement link count
 - didn't remove link but decremented link count
 - link count hit max and overflows
 - loops of hard links: not allowed, no hard links to directories! See diagram below.
 

Brief look at other problems in FS:
Example: GPFS (a big machine file system)
120 PB 200,000 hard drives - ~600 GB each
Some features:
- Stripes: blocks of data over multiple disks
 

- Parallel I/O
 - Distributed metadata - directory lives in file system
 
- ex./usr/binbig widely used disc drive
 - Several copies of common directories
 
- Efficient directory indexing - faster than O(N) (say, B-Tree structure)
 - Distributed locking
 - File system stays live during maintenance
 
magic-gpfs-clone /gpfs /gpfs-feb-25
cd /gpfs-feb-25
tar -cf /dev/tape