CS 111: Operating Systems Principles.
Scribe Notes 2/20/13.
By Richard Lung, Soumya Jain.
File system design
Why du gives different results?
du tells you how much disk space is "actually" being used [being used, not about apparent size].
companion utility of du are df and ls.
$ls - l bigfile [apparent size]
$ls -ls bigfile [actual file size (real size file in blocks)]
Utilities are a little window in the file system.
Why does it make sense for apparent space to not be equal to actual file?
- When real > apparent
- When disk space < data
- With hard lines, you'll save directory 1&2. and they point to same place.
For example:
$du dir1
$104 dir1
$du dir1 dir2
$104 dir1
$4 dir2
$du dir2 dir1
$104 dir2
$4 dir1
The discrepancy in the directory sizes is because we have multiple pointers to the same file, as shown in the image above.
File systems
A file system is :
to user: a collection of files.
to implementers: support an abstraction of collection of files y means of a data structure that lives on secondary storage and in primary memory, it should support persistence, security, efficiency.
This hybrid data structure should be:
- persistent
- secure
- efficient
A dumb file system (circa 1974) made by Prof. Eggert
Pick a block in disk drive and that would be a file and so on.
Q1. How do you know where the files are?
Problems?
- To create a file, you need its size in advance.
- Fixed directory size -> leads to waste of unwanted limit # of files, there's nothing in data structure that tells where the free space is.
- limited length on file.
- implication that the directory must live in RAM.
Problem of fragmentation
- External fragmentation:
Enough free space to hold the object you're trying to allocate, but it's fragmented. - Internal fragmentation:
unused space within allocated sections not used due to efficiency season. Usually disk space is allocated in blocks of fixed size eg. 512. Hard to grow files.
Possible solution: Create a defragmenter.
But Prof. Eggert's idea was already implemented RT-11 file system.
Real time continous files -> good predictable, i/o.
We would like to have a file system that has a level of indirection i/o directory and file system.
make it easier for files to grow.
FAT(file allocation table) file system (late 1970's)
Size: most of file system is allocated to data & a little to FAT.
To grow with FAT approach:
- Find - 1 in the FAT.
- Find the EOF block.
- Modify FAT entry to point to new block.
- Modify FAT entry to EOF.
- Update size in dir.
- Write data (to 2 data blocks).
FAT file system downsides:
- No hard links
- Accesing sequentially can require a lot of seeks. lseek is O(N) (many old problems remain deframentation fixes)
The UNIX File system
Renaming a file:
mv d/e d/f
mv d/e x/y
Berkeley Fast File System (Unix-like filesystem)
This file system, developed by a Berkeley Graduate student, has the following disk space structure:
Boot Sector: Information on disk booting and loading up the OS.
Super Block: Contains version number, size, amount of free space, and location of rootdir of the filesystem
Bitmap Block: 1 indicates used blocks and 0 indicates free blocks.
Inode Table: This table stores all the inodes that have relevant file information.
Data blocks: Contains data; depending on the size of the file, these blocks could instead be pointers that divide the file into even more blocks.
Inodes
Inodes have a pointer-type hierarchy, which helps in finding files fast and helps in extra storage of data by indicating data blocks with pointers to more data.
Links
As inodes do not explicitly have file/directory names to identify them, each file has its own inode number which can be found by using ls command. There are two types of links, symbolic link (soft link) and hard link.
Symbolic link is like a shortcut to the actual file, so if the actual file is deleted, the symbolic link would not work either.
Unlike symbolic link, hard link not act only as a shortcut to the actual file, but also contains the same contents by cloning the original copy.
Holes
One of main issue with the inodes are holes. Since BFFS file system's file allocation is contiguous and solve external fragmentation issue, but we have to replace deallocated address with 0 which create holes. Though holes are typically small and not significant, but still is memory leaks.