We are always worried about seek time but we're also still going to estimate incorrectly. We will tend to overestimate the cost.
Another simplification here is that we ignored latency and acceleration. The |b-h| is more of a back of the envelope estimate.
Let's assume the simplest disk scheduling algorithm we can think of: first come first serve (FCFS) disk scheduling. You service requests in the order they occur; queue them up and service them in order.
If we assume random access, what is the cost is going to be? h is chosen randomly because that's where the previous request was. b is random as well. The average cost of an access? It's an average value of 2 numbers. Suppose h is 0 and you choose b at random. As in Figure 2, this looks like a parabola. It's calculus! We take the integral of something with respect to dh
Let's say we have two processes.
How about the "Elevator algorithm"? It is like SSTF but it remembers what direction it was going. (Actually modeled after real-life elevators).
*About starvation in elevators and using SSTF: let's say you're in a ten story building and you're on the tenth floor. Most people are on the first few floors and the entrance is the first floor. Based on SSTF, the elevator would handle elevator call requests for what's closest to where it currently is, meaning it would likely stay in those first few floors. The elevator will constantly be around the first few floors, and the elevator may never come up to you at the top floor!
Schedule requests closest to h in the direction you're moving. This takes care of the starvation problem, but some people are treated less fairly than others: people who are waiting at the bottom or top floor. The elevator algorithm is unfair to the edges, as they are the furthest away.
To remove the unfairness, there is the "circular elevator algorithm." The idea is that the elevator isn't in a line, but in a circle. The physical properties of the elevator (and our disk) haven't changed, but you're always moving in a positive direction. It (always) goes up. (Of course, you can't always go "up," but this is how it works. In the hypothetical ten-story building, after level ten comes level one.) For example, in Boelter Hall, the elevator always go up. When it is at top floor, it goes all the way to first floor. This provides more fairness and you have fixed the fairness problem. Everyone is treated equally badly. You lose a bit of performance, though: you have to occasionally take the elevator from the top to bottom without doing any work in that time. It's just the cost of being fair.
Don't we all care about performance though? Why would we care about fairness so much? In real time systems, you care more about predictability than you do about performance.
Suppose the requests come in the same order as shown in Table 1.
We satisfy the read request for 1000, and then 91361. The disk has already moved to satisfy this request. It's an inefficient scheduling algorithm. We would like an “oracle” to tell us what's happening next...
When you design a future file system, you'll have to worry about all that stuff too -- disk, flash, etc. You will still run into fairness issues. The metric you're trying to minimize will just be different.
File System Design and Organization
The idea is tricky. A file system is a data structure that lives in 2 places at once. For now let's just worry about disk.
Think about it as a data structure that lives on disk that provides an abstraction of a set of organized files. What's really on the disk is a bunch of blocks
Both the RT11 and FAT have problem the where they aren't robust if the disk is flaky! If you lose a sector and it's the wrong sector, you can lose data as well.
FAT
+ easy to grow a file
+ no external fragmentation
+ no artificial limit to the number of files
- sequential reads can be really slow and can require lots of seeking (to read next block, you may have to do two seeks) How do you deal with this? A defragmenter can fix this.
How do you implement lseek? lseek goes to a random spot in a file system. lseek turns into an order N operation and will slow down applications that want to do random access. Random access into an FAT is slower than in an RT11.
$ open(-"f",O_CREAT | O_TRUNC , 0666)
$ lseek(fd, 1000000000000000, SEEK)
$ write(fd, "h", 1)
To do this on SEASNET:
$ dd if=/etc/passwd of=big seek=10T
dd is a command that will read from the file, write to the file
$ ls -l big
$ cat big
Suppose Seasnet tries to back up this file as you're doing this... If you do this before 2am... oops. You'll run into problems. The backup is no better than cat.
Directories = arrays of directory entries. See Figure 8.
Compare to fragmentation (Figure 9)
$ ln a b
$ rm a
// like the unlink syscall
// marks type field as not there
How about lseek? O(1) because there is at most 4 operations. Sure, it's a constant factor, but it does require 4 seeks!
If you have a corrupt file system, there will still be problems here!