Lecture 11 
 File System Performance: 
 This is a continuation of Lecture 10 
 Prefetching... When is it bad? 
 
	-  Prefetching normally assumes 
	 locality of reference .
	
 
	-  Assumes that the data we are prefetching is correct. 
	This is a problem of  cache coherence .
	
	Say two processes hold a cache for a file that they are reading in. 
	This cache contains the prefetched contents of the file. Process 1 then writes to the file.
	Now Process 2's prefetched data could be inconsistent with the file. 
	
	We can fix this issue by moving the cache into the kernel and giving each process access to this 
	cache. Now whenever process 1 or 2 writes they will both see the others changed data. 
	 
 Dallying Downsides 
	Say we issue the following.
	
	write(...); 
	write(...); 
	write(...); 
	write(...); 
	write("done"); 
	[crash]
	
	These writes assume that we are writing to disk but in reality we have been writing to cache.
	And if the machine crashes before we can flush these changes to disk those changes will be lost.
	So we introduce a 
	 write barrier .
	A write barrier will flush out all the cached writes to disk on command.
	
	We have multiple possible function calls to accomplish this:
	
		- 
			sync(); 
			This pushes every cache to disk. This means the caches for every single file currently being cached.
			Obviously this is undesirable since an individual process doesn't need to flush every cache.
			Just those that it uses.
		 
		- 
			fsync(fd); 
			This function does just that. It flushes all parts of the cache associated with fd to disk. 
			Thus removing the penalty of dumping all files. This is still too slow for practical applications.
			So there is yet another alternative.
		 
		- 
			fdatasync(fd); 
			This function is faster because it doesn't attempt to dump a files metadata at all. It only dumps
			the data portion associated with fd. This makes it so we don't have to write the disk in more than 
			one place. If we had to write metadata we would seek to the block bit map and seek to the data. With
			this function we only seek to the data. 
		 
	
 File System Design 
 Story Time 
IBM announced they are shipping a 120 petabyte file system to a nameless customer. 
Cheapest fast drives were 600GB and we assume they are 15K rpm drives. And we need ~200,000 drives. 
 IBM story 
 Problems 
	- 
	Various disks must be consistent.
	
 
	- 
	Need to give each set of drives to a CPU, but disks don't require too much CPU time. 
	This means we will have many disks connected to one CPU but we still need a lot of CPUs to 
	to interact with the disks efficiently. (One CPU means one bus which means all traffic flows on one wire
	which is highly inefficient).
	
 
	- 
	Need an interconnect with high throughput and low latency (lot of bits/second)
	
 
	- 
	Having multiple processors now causes us to have cache coherency issues since each processor
	holds it own cache. So we need a new way to implement the sync functions specified earlier. 
	
 
 
This all comes down to having as much parallelism as possible. 
To take on this problem lets look at creating a simple file system first.
 
 Simple File System 
Our simple file system will just be considered an array of sectors.
If we want a file we just put it in the first available section of memory. 
Also have a region to hold file metadata. A directory is 256 bytes which is an array of 
dir entries which contain info about where the files are.
There are no breaking of files, they are stored contiguously.
 Problems 
	- 
	Fragmentation:
	
		- 
		Internal Fragmentation: This is the wasted bytes in a sector that a file doesn't use.
		
 
		- 
		External Fragmentation: This is the problem caused by storing the files contiguously. If there is a file
		placed in the middle of our file system and we want to add a file thats bigger than space on either side of the file
		we can't add it even though in reality there is enough free space its just split by the middle file.
		
 
	
	 
	- 
	We need to know the size of a file upon creation since we store them contiguously.
	
 
	- 
	We have an upper limit on the number of files limited by our metadata section.
	
 
 Advanced File System Technology 
 FAT File System 
 Goal 
	
	Want to kill the external fragmentation problem and the preallocating (atleast for data).
	
 Details 
	
		File 
		Allocation 
		Table 
		So we have the following divisions: 
	
		
		
		| Boot Sector | Super Block | FAT | ............ data ........... | 
		
		
	 Boot Sector 
		 
		We didn't learn this piece of the filesystem but here is a link: 
		Boot sector explanation
		
	 Super Block 
		This contains metadata about the file system. 
			These include the size of the file system, the version number, and other assorted data. 
			
			The Super Block has a fixed size.
		
	 File Allocation Table 
		
		A FAT is an array of block numbers. For each 4Kib block in our file system we have an entry in the FAT.
		Each entry is either a 0 => EOF , -1 => Free Block, N : Block number of next block in the file we are looking at.
		This creates a linked list of storage. With the next pointers being the block numbers in the FAT.
		
		This solves the problem of external fragmentation. Since we can string non-contigous blocks of memory together for a single file.
		
	 Data 
		
		This is just the actual file data.
		
		
	 Directory 
		
		A directory is a file whose contents are an array of directory entries.
		A directory entry contains a name, a size, and the index of the first block in the file.
		
		
		
		| ...Name... |  Size  |  Block index in FAT  | 
		
		
		
		We also need a root directory. This is simple just have it be the first entry in the FAT.
		
		
	 Issues 
		
		- 
		We assumed that the FAT was storable in RAM but this isn't scalable since as the number of blocks increases
		so does the number of required FAT entries. So our FAT increases with the number of sectors.
		
 
		- 
		Contiguous allocation is desirable. This is clearly possible early on, but as we add and remove
		files our disk gets fragmented and we can't get contiguous allocation as easily.
		
 
		- 
		lseek(fd, N, SEEK_SET); Is now an O(N) operation because we have to traverse the linked list looking
		for the correct block. Which will require the disk arm to move around the FAT if it isn't in RAM. 
		
 
		- 
		Rename doesn't work since we need to write in two different directories.
		
 
		
	
	We can fix all this by introducing a new file system approach.
	
	
	 Inode Idea 
	 Static Data Structures 
	
		As with the FAT we have a boot sector and a super block, but the difference is the 
		inode table. 
		
		The inode table contains all the memory to hold all the files that will exist. This is a parameter 
		set upon file creation.
	
	 Inode 
	
	An inode is an on disk data structure for each file in our file system.
	This is all preallocated, which means there is a max number of files our 
	file system can have upon creation.
	
	It has the following structure.
	
	
		- 
		A File size.
		
 
		- 
		A link count.
		
 
		- 
		A type.
		
 
		- 
		Premissions.
		
 
		- 
		10 Direct Blocks - Block numbers telling which blocks contain file data.
		
 
		- 
		An Indirect Block - A block containing a block number. 
		
 
		- 
		A Doubly Indirect Block - A block to a block containing a block number. 
		
 
	
	 Improvements 
	
	lseek(...) is now O(1) because we can just find the block by offset into the file.
	
	But we still need to fix the other 3 problems specified earlier. 
	
	
	As an aside, the Berkeley File System also includes a block bit map, which says 
	if a block is free or not. This helps encourage contiguous allocation.