Jingzhi Yu | Qi Shao
Hybrid Data Structure
RAM + DISK
Directory Entry
Directories are handled as normal files, but are marked in inode type as directory
Property
- Multiple directory entries may reference the same Inode number (hard link)
- Users identify files via pathnames
a/b/c
that are mapped to Inode numbers by the OS- If the path starts with
/
A directory entry contains:
Size Content 32-Bit Inode Number 16-Bit Directory Entry Length 8-Bit Name Length 8-Bit File Type N-Byte File Name
Inode
Each file is represented by an Inode
Property
- Stored in Inode table in main memory, also on disk
- Record metadate for file
- Fix size per file
- Identified by an integer number, not a name
It constains:
Content owner timestamp permission type(inmutable/fixed) Note:
Inode:Both directory entry and inode entry have file type
The type info in the directory entry is cached from the inode entry
- 1: find . -type f
- 2: find . -mtime +1
2 is slower than 1 because it has to go to the inode instead of the directory entry
The following are layers of abstraction in a file system design(up to bottom):
unlink("foo")
It is possible that you have multiple directories for the “foo” file(hard link). Even if one of the links has been removed, others still exist, which means link count is still non-zero.
Even if there is only one link, the run is still not complete. Considering the race condition, before you remove the file, somebody could create a copy of the file like below:
$ ls -l foo
$ ln foo bar
$ rm foo
So, there are two main problems in terms of removing a file in the file system:
So, what should we do? Of course, we don’t want to physically destroy the disk.
One method could be write over the file with random numbers and then remove it. Actually there is a command called shred
for this purpose. If you want to make sure the file is destroyed, you can shred it multiple times. In most cases, this is enough.
Shred doesn’t work well in log-structured file system. So when dealing with this kind of file systems, remember to shred the file system itself!
$shred file
- open file
- set target size
- overwrite the file with random data
- close file
- unlink file
The internal mechanism of interpreting file names is called “namei”. Basically, it establishes a map from strings given by users to corresponding devices, inode pair.
For example open("a/b/c", ...)
, the OS will start at the working directory (for a relative path) or the root (for an absolute path). It does a left-to-right traversal over a file name, exploring directories and keeping track of the current directory’s inode. From the inode, we can get the directory entries to look up the next file name component and continue down the tree.
For example, every file has a inode number. After exploring a/b, the current directory inode will be b’s inode, whose number is 1017 and the file name component to resolve is c. The new device, inode pair will then attempt to be resolved.
If a file is missing during this traversal process (for example, if c is nowhere to be found once a is the current directory), then errno is set to ENOENT. If search permissions (the x bit) are missing on a directory to be explored, errno is set to EPERM.
Note:
Every process has a working directory, even with the same command$ cat a/b
, the contents printed on the screen can be different as the working directory may vary.
$ open ("a/b/c/d", O_WRONLY|O_CREAT|O_EXCL)
Note:
Last component in the specified path - in this case is “d” - does not necessarily exist. For example,link("a","b")
, file “b” does not have to exist.Link:
link count = count of # directory entries pointing at that inodeA link is just a pointer to an inode. A directory is an inode that holds links. Each filename in a directory is just a link to an inode.
Hard links to directories in Linux are not allowed. One potential problem it might cause is creating a cycle. For instance:
$ mkdir a b
$ touch a/f b/g
$ ln a b/l
$ ln b a/k
In this case, if we allow directory a to point to directory b and directory b to point to directory a, two different directories in different points in the filesystem are pointing to the same thing. In fact, a subdir could point back to its grandparent, creating a loop.
Why is this loop a concern? Because when you are traversing, there is no way to detect you are looping (without keeping track of inode numbers as you traverse). Thus, allowing hard links to directories would break the directed acyclic graph structure of the filesystem, creating directory loops and dangling directory subtrees.
Recall the structure of an inode entry. There are uid, timestamp, and other information of the file. Following the fixed info, there are direct block pointer, indirect block pointer and maybe indirect2 block pointer.
Indirect blocks in an inode are used when the file size exceeds the amount of data that can be stored in the very limited direct blocks. The indirect block stores addresses to blocks where more data is stored. An indirect block can store addresses of BLOCK_SIZE / ADDRESS_SIZE blocks. When the first indirect block has been used up, the file system will allocate the doubly indirect block, which stores addresses to more indirect blocks that store addresses to data blocks. See the diagram below.