Batching
Batching is used for reads and writes. Data from multiple jobs is stored in blocks until the read/write is performed, minimizing overhead. Large block sizes result in good throughput, since it minimizes the number of times read/write is called, while small block sizes provide good latency, as smaller jobs are processed quicker.
Prefetching
Prefetching only works for reads, not writes. In prefetching, the OS guesses where the app will read next. While the app is busy, the OS reads in the next block so that it is ready when the app needs it. In this sense, we get "double buffering for free." However, the downside to this approach is that we may fetch an extra block we didn't need. This ties up the bus for data that never gets used.
The OS handles the guessing at runtime. There are multiple methods for guessing, including sequential guessing, backward sequential/previous guessing, and assuming random access (random access guess). In sequential guessing, the OS has noticed that blocks are being read in increasing order, so it will automatically read in the next one. Similarly, in backward sequential/previous guessing, the OS notices that blocks are being read in backward order, so it automatically reads the previous block. If the OS assumes Random Access, then nothing gets cached and prefetching is effectively turned off, as it cannot guess which block will need to be accessed next.
Prefetching is also part of a larger concept called "Speculation," which is described below.
Speculation
An OS which speculates makes guesses about the near future. Speculation is designed to improve performance. These guesses could possibly result in more work for the OS now, but less work later.
In other words, the next instruction performed may not have as good performance
as it would if the OS had not been speculating, but overall performance is improved.
Speculation assumes spatial locality and temporal locality. Spatial locality assumes that blocks near each other will need to be read. For example, if you read the ith block, the OS assumes that you'll probably want the i+1 and i-1 blocks as well. Temporal locality assumes that blocks which were just read will be needed at some time in the very near future. For instance, if you read the ith block at time t, the OS assumes you'll want the ith block at time t + d, where d is small.
Dallying
Dallying is the write-equivalent to prefetching. It is only used for writes. When an application tells the OS to write something, the OS doesn't actually do it right away. In dallying, the app says something like write(fd, buf, 1024), which returns 1024, but the write will go into main memory. Rather than being written to disk, the OS simply caches what needs to be written for now. The OS can then write over it, eliminating the previous write job that would have been done, or it can append to what's already in the cache, batching the work.
CPU Cache (D4) | |||
| | |||
-------------------- | ------------- | ---------- | ----------------------------- |
| | | | ||
RAM (D3) | Disk Controller Cache (D2) | ||
Disk (D1) |
A downside to dallying is cache coherence. The CPU's cache, RAM, disk controller's cache, and disk can all have coppies of the same data. D1 is on the disk, D2 on the disk controller's cache, D3 on RAM, and D4 in the CPU's cache as shown above. When all copies of the data D1-D4 are the same, there is no issue. If they are different, then this can cause problems. If power is lost, then only the D1 copy of the data on disk is preserved. Inconsistencies in copies of data can be avoided by using sync(), fsync(fd), and fdatasync(fd). sync() moves RAM data to the disk by scheduling all cached blocks for writing to the disk (i.e. "stop dallying"). However, a problem with sync() is that it works on all processes, whereas we may only want to write data associated with a particular file, and writing all of RAM to disk will take a long time. Furthermore, sync schedules the work, so it returns before all the data is actually written, meaning that the user does not know for sure that the data is safely on disk. An alternative to sync is fsync(fd). This function only syncs parts of RAM associated with the file descriptor fd. It ensures the file descriptor's blocks are on the disk (usually a long wait time ranging from 10 to >100 ms) before it returns. Though fsync() is an improvement on sync, it still has its issues. When data is updated using fsync(), more than one block needs to be updated. That is, both the data and metadata (i.e. change the "modified time") have to be updated. fdatasync() is the best choice because it only syncs the data, not the metadata. This would be good for scenarios such as keeping track of bank accounts, where we care about the balance and not the transaction time. However, not updating the metadata could lead to problems such as $ls -l showing the wrong information after a crash.