CS 111 Lecture 12 (5/14) - File System Implementation

Scribe Notes by Kevin Nguyen, Kenneth Tu, Steven Ly, and Jason Zhang

Levels of a Traditional UNIX File System
File Systems

LEVELS OF A TRADITIONAL UNIX FILE SYSTEM

Sectors

In computer disk storage, a subdivision of a track on a magnetic disk is called a sector. Sectors are units that performs read and write in the file system. Back in 1956, sectors started off at 512 bytes longs. It is anticipated that hard drives will move to the "advanced format" that use sectors of 4096 bytes. (Writes must be at least one sector large.)

Blocks

A group of sectors is known as a block. Blocks are traditionally 8192 bytes, or 16 sectors, long. Bigger block sizes increases fragmentation and causes smaller files to waste more space. Some file systems actually allow smaller files to share a single block, which helps reduces fragmentation. Smaller blocks offer the benefit of flexibility. For applications that have lots of sequential I/O requests, such as scientific calculations, larger blocks sizes would increase efficiency.

Partitions

Partitiions create smaller "virtual" disk out of one larger physical disk. It is possible to use different file systems in each partition. It is also possible to create one large "virtual" disk out of several smaller physical disks.

FILE SYSTEMS

How do we use Multiple File Systems at the Same Time?

Say /usr and /home use different file systems.

$cp /usr/bin/sh /home/eggert/junk/sh

//cp calls open
int ifd = open("/user/bin/sh", O_RDONLY...);
int ofd = open("/home/eggert/junk/sh", OWRONLY|OCREAT, 0666);

Traditionally, the file system type is designated in front of the file. Given that there are only 26 letters in the alphabet, the system can keep track of 26 different file systems at a time in a table stored in kernel memory.

$cp A:/usr/bin/sh B:/home/eggert/junk/sh

Directories

/usr/bin is stored in an inode, say inode number 3762. sh is stored as a directory entry in inode 3762. Its directory entry points to the inode that stores the file, say inode number 263. Inode number 1 is reserved for the root directory, designated by "/"

If there are multiple slashes in a path name, the extra slashes are ignored.

$cd /usr/bin///sh
#This is interpreted as
$cd /usr/bin/sh

There is an exception when there are exactly two leading slashes at the start of the path. The implementation of this is platform dependent. If a directory entry is not found, the open function returns -1 and sets errno == ENOENT. If the path does not start with a slash, the working directory is used as the start of the path. The kernel stores the inode number of the current working directory in the process table. The chdir system call changes the working directory by setting the inode number of the current process.

$chdir("/bin")

1972 Chdir Bug

One of the first bugs found in UNIX was found in chdir in 1972.

//chdir.c
int main(int argc, char* argv){
if(argc != 2)
error();
if(chdir(argv[1]) != 0) {
error(argv[1]);
return 1;
}
return 0;
}

There were no bugs in the actual implementation of chdir. The correct inode number is written into the process table. The bug was actually in the shell, which did not change its local working directory. The fix was to have the shell recognize the change directory command and update its local working directory before calling chdir.

Chroot

chroot changes the root directory of the current process.

//Change root directory to /home/eggert/junk
chroot("/home/eggert/junk/");

The process then cannot access files above the new root directory.

//This would execute /home/eggert/junk/usr/bin/sh
execvp("/usr/bin/sh");

chroot presents a possible security issue, allowing users to pose as root. On login, two files are checked to verify the user's password: /etc/passwd and /etc/shadow. These files can be spoofed by creating alternate files /home/eggert/junk/etc/passwd and /home/eggert/junk/etc/shadow. Then calling chroot.

chroot("/home/eggert/junk");

The user can now execute a command as root using sudo. Sudo wil check /home/eggert/junk/etc/passwd rather than /etc/passwd. To fix this issue, chroot is a privileged system call that can only be executed by root.

chrooted jails are subset images of the root directory created by a superuser. They are often used by web hosts to create multiple virtual hosts on a single server. One virtual server can have an Apache web server and all relevant libraries stored in its chrooted jail. Thus, that single server cannot modify any files outside of its jail, keeping the servers from interfering with one another. chrooted jails are created by:

//This creates an Apache server that can only modify its own files.
fork();
chroot(“/a/b”);
chdir(“/”);;
setuid(“apache”);
execlp(“/usr/bin/apache”);

chrooted jails cannot be escaped by using “..”. If the working directory is already the root directory, “..” is treated as “.”.

Multiple File Systems

In a multiple file systems, the user just sees file names. The mount table tells you about the file systems being used.
Mult File Systems Mult File Systems continued

Mount Table

Mount Tables are stored in kernel memory. Inode #'s are local to the file system that they're in.
To unique identify a file, we need: dev_t ino_t, filesystem #, inode #.
Mount Table File System
Directory layout (UNIX 1977)
- 16 bytes (14 for name, 2 for inode number).
Linux ext4 d 2
- 32-bit inode number, 16-bit directory entry length, 8-bit name length, 8-bit file type.
Mount Table Continued
For small Linux directories, a concatenation of the above is used.
For large Linux directoriess, a hash table is used.

Hard Link

Hard links are 2 different directory entries taht point at the same file
To unique identify a file, we need: dev_t ino_t, filesystem #, inode #.
To create a hard link, we use the ln command

//This creates a hard link at /home/eggert/junk/pass for /etc/passwd
$ln /etc/passwd /home/eggert/junk/pass

Therefore, hard links cannot be created for directories, they can only be created for non-directories.
There may be certain situations where hard links can cause errors to certain function calls. The following example would not work if we try to invoke a call of pwd.

$ln /home/eggert/junk /home/eggert/junk/j

So in this example we have a hard link from j that points to a junk directory, but it would not work since pwd would subsequently invoke cals on both open and readdir

//Code for pwd
readdir(home/eggert/junk) //look at names in parent directory
fd open ("home/eggert/junk", ....)

Because of the error, we end up recursively going back from directory junk and j and end up with the following recursion: /home/eggert/junk/j/junk/j/junk/j

BSD FFS Layout

BSD FSS Layout
Acceptable commands for the BSD FFS Layout include creating a file, writing some data, and extending file. In the block bitmap, each block is represented by one bit, with a 0 indicating that it's free, and a 1 indicating that it's allocated. The BSD FFS layout may not be optimally efficient since it requires 3 lseeks and 3 writes to do a write. This, however may not be all thad bad, since a lot of applications are run in parallel, and blocks are always allocated next to each other which help spatial locality. There is a correctness issue however when a power is pulled, and half of the writes are finished, but this topic will be continued in the next lecture.