CS 111, W13
Prof. Eggert
Lecture 12
Week 8, Monday
Angela Navarro - 203817385
Bryan Yasukawa - 403775761
Daynan Lai - 003766382
Kevin Fongson - 603785957
Problem with Symbolic Links
Example:
- In emacs, let's say you're editing/etc/passwd
- While the emacs buffer contents != file contents, emacs creates an extra file (symbolic link) /etc/.#passwdthat points toeggert@penguin.941[host + pid]
- This is a locking mechanism, and the symbolic link contains info about emacs
What can go wrong?
- Non emacs editors
- emacs runs stat to check if file's timestamps has changed before rewriting
- Emacs crashes & exits before extra lock can be deleted, preventing new emacs from running
- workaround: issuekill() system call with the process ID - kill(941, 0)
- This workaround results in another issue.PIDs are reused, so by the time you call kill(), that PID could belong to another process
- Emacs loops while holding lock
- Workaround: emacs can steal the lock. This blows away the symbolic link and creates its own
- Suppose/etc/.#passwdalready exists for some other reason
- Workaround: if a regular file exists by this name, then we skip all locking and charge ahead. We hope that this is a rare occurrence
- Another app removes the lock file (or changes what it points to)
- this messes up emacs!
- You haven't changed the buffer yet and someone else locks it
- when you start editing the file, emacs stats the file again and checks the timestamp (same solution as #1)
- File name base after last slash >= 254 bytes (going over 255 byte limit)
- Solution: same as #4, you just ignore locking since you can't create the lock file
- Different emacs'es on different hosts can interoperate
- lnxsrv01.seas.ucla.edu vs lnxsrv03.seas.ucla.edu
- Windows: doesn't like symbolic links
- You can't create symbolic links unless you have a special "Create Symbolic Links" privilege
- Workaround: use a regular file instead, i.e.
cat /etc/.#passwd
eggert@penguin.941
This workaround breaks the solution in#4
Alternatives to using symbolic links for lock files
- system call: fcntl(fd, F_SETLK, ...)
- POSIX only, won't work on windows
- postdates emacs
- doesn't work with network file systems until NFSv4
- machine boundaries do not let outside machines kill a machine's own process
- just using regular files instead of symlinks
- performance:
- with symlinks, instead of having data in a separate block you can put data into the inode's contents
- this optimization is possible if the symlink length < 48 bytes, which avoids an extra seek
- getting contents:
- regular file: fd = open(...) + read(...) + close(...)
- symbolic link:symlink(".#file", buf, size)
- symbolic links use 2 fewer syscalls and are atomic, which means they always get the whole contents of one version of the file
Example: Here symlinks are treated as regular files
$ ln -s 'eggert@27' foo
$ ln foo bar
- Above has hard links to the same symlink
- Two different names for same file
- Symlinks are read only, cannot change the contents of a symlink
- You can't change footo change bar
- Suppose someone wants to open(/foo/bar)
- have to look at 4 disk spaces
- slows down filename resolution
Example: Here symlinks are treated as directory entries
Symbolic links are different types of directory entries
- varying amount of space
+ fewer disk accesses
- no hard links b/w symlinks
Consider the following exploit:
Attacker (eggert):
I know someone will put data into /tmp/foo so...
$ ln -s ~/eggert/data /tmp/foo
Victim:
unmask 077
sort -o /tmp/foo
uniq /tmp/foo
rm /tmp/foo
So now the attacker has a copy of the file but can't read it:
Attacker (eggert):
I know someone will put data into /tmp/foo so...
$ ln -s ~/eggert/data /tmp/foo
$ touch ~/eggert/data
$ chmod 777 ~/eggert/data
Victim:
unmask 077
sort -o /tmp/foo
uniq /tmp/foo
rm /tmp/foo
Now the attacker can look at the file!
File Name Resolution:
$ open("a/b/c/foo", O_RDONLY)
Steps:
- Get the process' working directory entry D from the process table:
[ ~~~~~~~~ | 3961 (working dir) | ~~~~~~~~~~~~~~~]
- Get 1st file name component C
- Look up C in D's data
- if none, fail with errno = ENOENT
- if its a symbolic link, substitute symlink contents to be the actual path
- Now we have inode I
- if D = I, loop back to 1
The system call chdir uses this algorithm above to set the process's current working directory to D.
Problems:
- Suppose there exists a symlink a/b -> x/y
- If symlink contents starts with a slash, we have to erase the beginning of the path up until now to get the correct path.
- What if its a symbolic link loop?
- keep a counter of the number of symlinks traversed. the limit is 20, otherwiseerrno = ELOOP
- this solution is a heuristic designed to improve speed
- If the path starts with a slash, the system call chdir("foo") will change the working directory.
Sidenote:
#include <unistd.h>
int main (int argc, char** argv) {
chdir(argv[1]);
}
--
$ gcc main.c -o mycd
$ ./mycd /tmp
$ cat foo
This won't work because chdir and chroot can't be called in a program (changes a child process's dir or root, not the parent shell that called it)
Link counts and hard links:
Problems:
- removed a link but forget to decrement link count
- didn't remove link but decremented link count
- link count hit max and overflows
- loops of hard links: not allowed, no hard links to directories! See diagram below.
Brief look at other problems in FS:
Example: GPFS (a big machine file system)
120 PB 200,000 hard drives - ~600 GB each
Some features:
- Stripes: blocks of data over multiple disks
- Parallel I/O
- Distributed metadata - directory lives in file system
- ex./usr/binbig widely used disc drive
- Several copies of common directories
- Efficient directory indexing - faster than O(N) (say, B-Tree structure)
- Distributed locking
- File system stays live during maintenance
magic-gpfs-clone /gpfs /gpfs-feb-25
cd /gpfs-feb-25
tar -cf /dev/tape