CS 111 Winter 2012

Lecture 10: File System Performance

by Daniel Ichwan and Seungdo Lee for a lecture by Professor Paul Eggert on February 15, 2012 at UCLA

Table of Contents

  1. Storage Devices
    1. Disk Drive
    2. Solid State Device (SSD)
    3. Hybrid Device
  2. Random Read/Write Performance
  3. Improving Read/Write Performance
    1. Sequential Read/Write Performance
    2. Batching
    3. Multitasking
    4. Direct Memory Access (DMA)
    5. Further Read/Write Improvements

Storage Devices

In this lecture, we handle the topic of file system and its performance. To understand the performance for the overall system, we will first look at its low level building blocks, the secondary storage device. We observe two different types of storage devices: the disk drive and flash memory.

Seagate Barracuda XT
3TB 7200RPM Hard Drive
Corsair Force 60GB SSD
Read Seek Time 8.5ms Read Speed up to 280MB/s (sequential)
Write Seek Time 9.5ms Write Speed up to 270MB/s (sequential)
SATA Transfer Rate to Bus/Cache 6 Gb/s Random Data Transfer Rate 700 Mb/s
Sustained Data Transfer Rate 130 Mb/s
Power Consumption 9.23W (operating)
6.39W (idle)
Power Consumption 2.0W (max operating)
0.5W (idle)
Operating Shock 63 Gs for 2ms Operating shock 1000Gs for 2ms
Non-Operating Shock 300 Gs for 2ms Non-Operating Shock 1000Gs for 2ms
Price per GB $290/3TB = $0.10/GB Price per GB $143/60GB = $2.40/GB

(A) Disk Drive


(B) Solid State Device (SSD)

SSDs act like disk drives but are internally based on NAND flash technology.
Wear-Leveling Algorithms
These algorithms prevent favoring blocks when writing to flash memory. These algorithms are either in (A) device controller or (B) in OS/CPU.


(C) Hybrid Storage Device

Random Read/Write Performance


	0	char buf[8192];					// buffer: 16 sectors of 512 bytes
	1	for (;;) {
	2	   lseek (ifd, random_location, SEEK_SET); 	// set the read offset to a random location
	3	   n = read(ifd, buf, sizeof buf);		// ifd = input fd
	4	   n1 = compute(buf, n);			// a generic process that will transform the buffer without changing its size
	5	   lseek(ofd, other_random_location, SEEK_SET); // set the write offset to a random location
	6	   write(ofd, buf, n1);				// ofd = output fd
	7	}						// assume raw drive
			
For the given code above, we will assume a few things:
  1. Using a raw drive and independent disks (one for reads, one for writes)
  2. n = n1
  3. n = sizeof buf
  4. Application instructions: 0.01μseconds
  5. Syscall overhead: 1μsecond
  6. I/O instructions: 10μseonds
We can observe the latency of each computation from the code above in the following breakdown: Total latency per iteration of random read/write loop
Latency 2(4.16ms) = 8.32ms
Seek 8.5ms + (8.5ms+1ms) = 8.32ms
Syscall 0.04404ms
Computation (using 8KB buffer) 10ms
= 8.32ms + 18ms + 0.04404 ms + 10ms = 36.36404 ms per iteration
→ Bytes per second: 8192 bytes/.03636s per iteration = 225,302 bytes/second (This is terrible!)

Improving Read/Write Performance

Random read/write performance is usually terrible. There are many ways to improve this performance; we will discuss the bolded ones in detail:
  1. Sequential Read/Write Performance
  2. Batching
  3. Multitasking
  4. Direct Memory Access (DMA)
  5. Direct Memory Access (DMA) with POLLING
  6. Caching
  7. Pre-fetching
  8. Speculation
  9. Dallying Writes


(A) Sequential Read/Write

To minimize seek time, this process should be sequential. This can be done in the code by removing the lseek() calls, which retrieved the data from random locations, allowing the new code to read/write sequentially:

	0	char buf[8192];
	1	for (;;) {
	2	   n = read(ifd, buf, sizeof buf);
	3	   n1 = compute(buf, n);			
	4	   write(ofd, buf, n1);				
	5	}						// assume raw drive
			


(B) Batching

The concept of batching is quite simple - use larger blocks/buffers when reading/writing:

	0	char buf[1024*1024];		//buffer: now 1MB (vs 8kB)
	1	for (;;) {
	2	   n = read(ifd, buf, sizeof buf);
	3	   n1 = compute(buf, n);			
	4	   write(ofd, buf, n1);				
	5	}							// assume raw drive
			
Total latency per iteration of batched read/write loop
Latency 2(4.16ms) = 8.32ms
Seek (track-to-track) 0.030ms
Syscall 0.02202ms
Computation (10ms/8kB)*(128kB/MB) = 1280ms
= 8.32ms + 0.30ms + 0.02202 ms + 1280ms = 1288.64202 ms per iteration → since each iteration processes a MB → 1MiB/1288.62 ms = 813,000 bytes/s




(C) Multitasking

When reading, also schedule something else that can be done simultaneously: (i.e. in read's implementation: read(...) { ... schedule() ... }).

Multitasking could possibly eliminate effective latency and seek time if scheduled properly, but adds the penalty of context switches.


Total latency per iteration of multitasked read/write loop

*Using non-DMA and only 8KB buffers (vs 1MB)
Latency (multitasking) 0
Seek (multitasking) 0
Syscall 0.02202ms
Computation (using 8KB buffer) 10ms
Context Switches (multitasking) .1ms/swich * (4 switches) = 0.4 ms
= 0.02202 ms + 10ms + .4ms = 10.42202 ms per iteration

→ since each iteration processes 8 KiB → 8192/10.42202 ms = 800,000 bytes/s (comparable byte/s rate to batching)


(D) Direct Memory Access (DMA)

In DMA, don't use the programmed I/O approach (see Lecture 2) (CPU intermediates reads from disk controller to memory), but now the disk controller talks directly to RAM. Non-DMA approaches have the added latency of 0.5ms for CPU mediation. This leads to the file system acting as a level of indirection between the application and the device controller:



	0	char c;
	1	for (;;) {			// read/write one byte at a time
	2	   read(ifd, &c, 1);		 // each read = seek + latency + etc = ~14ms
	3	   write(ofd, &c, 1);
	4	}
			


(E) Direct Memory Access (DMA) with POLLING

If the DMA approach is used without context switching (i.e. polling only), there is only one process with asynchronous I/O. This allows for the elimination of the latency from context-switching.


(F) Caching

In caching, when a read is done, the entire sector is retrieved preemptively. So if more bytes are read from that sector later on, they may be quickly retrieved from the cache (versus actually reading from the disk).


(G) Pre-fetching

If the file system sees a read from a given sector, the next few sectors will be read preemptively to reduce overhead. This approach is based on the Principle of Spatiality (a special case of speculation (see next section)), which predicts data nearby recently fetched data will probably be used as well.


(H) Speculation

In speculation, the file system anticipates what the application will do/need next and retrieves the necessary data preemptively.


(I) Dallying Writes

In dallying writes, a write to the disk will instead be written to cache. The file system will eventually write this cached data to the disk when there is CPU downtime. This dallying approach allows the file system/CPU to hold off on longer writes and instead return quickly and do more useful work. An added side effect of dallying writes is that reads of the dallying data will be faster as it is stored in cache and does not have to be read from disk.