The scheduling problem itself is general and many problems can be cast as a scheduling problem. Scheduling in the context of operating systems is usually the question of how to choose which processes run.
When considering scheduling it is beneficial to first look at scheduling metrics, or ways to measure how the scheduler is performing. Attempting to optimizing different metrics will illuminate different methods for scheduling.
Each process goes through a series of events in its lifespan. A typical lifespan starts when the process arrives (i.e. requests to be run). From there a process will (hopefully) start running. After that the process may produce some output. Finally the process will end. These events can be used as markers for measuring various time metrics.
These definitions are illustrated below in relation to the lifespan of a process in the eyes on the scheduler.
|-----------|-----------|------------| Process lifespan
| Process arrives
| Process starts
| Process first output
| Process ends
|-----------| Wait time
|------------------------| Run time
|-----------------------| Response Time
|------------------------------------| Turnaround time
The metrics above are per process. To measure the effectiveness of the scheduler the average is often taken across all processes. Other metrics like those below are inherently scheduler-wide.
Perhaps the simplest scheduling policy is a first in first out (FIFO) queue, more colloquially known as "first come first serve (FCFS)". In this policy processes are executed based on their arrival time.
Say we have the following processes:
Process Run Time Arrival Time Time To Response
------- -------- ------------ ----------------
A 5 0 1
B 2 1 1
C 9 2 3
D 4 3 3
Using the first come first serve policy the processes will be executed in the order presented. The resulting time line is represented below with the following key:
A
: Process request arrives[
: Process starts]
: Process ends.
: Process waiting-
: Process running!
: Process first outputsTime: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
------------------------------------------------------------------
ProcA: [ ! - - ]
ProcB: A . . . [ !]
ProcC: A . . . . [ - - ! - - - - ]
ProcD: A . . . . . . . . . . . . [ - - !]
With this example we can calculate the wait time of each process and the average wait time for the scheduler. Similarly we can measure the average run time and average response time.
wait time = start time - arrival time
-------------------------------------
ProcA: 0 = 0 - 0
ProcB: 4 = 5 - 1
ProcC: 5 = 7 - 2
ProcD: 13 = 16 - 3
-------------------------------------
Avg: 5.5
run time = end time - arrival time
----------------------------------
ProcA: 5 = 5 - 0
ProcB: 6 = 7 - 1
ProcC: 14 = 16 - 2
ProcD: 17 = 20 - 3
----------------------------------
Avg: 10.5
response time = first output - arrival time
-------------------------------------------
ProcA: 1 = 1 - 0
ProcB: 5 = 6 - 1
ProcC: 8 = 10 - 2
ProcD: 16 = 19 - 3
-------------------------------------------
Avg: 7.5
When the operating system changes processes it must perform work to restore registers, memory, stack, etc. This process is called context switching and sometimes needs to be taken into account.
The above example assumes no context switch delay. If we assume a context switch takes one unit of time and the first process requires no context switch the time line would look like the following:
Time: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22
---------------------------------------------------------------------------
ProcA: [ ! - - ]
ProcB: A . . . C [ !]
ProcC: A . . . . . C [ ! - - - - - - ]
ProcD: A . . . . . . . . . . . . . . C [ - - !]
We can now recalculate the metrics to see the effect of context switching.
wait time = start time - arrival time + context switch time
-----------------------------------------------------------
ProcA: 0 = 0 - 0 + 0 * C
ProcB: 5 = 6 - 1 + 1 * C
ProcC: 7 = 9 - 2 + 2 * C
ProcD: 16 = 19 - 3 + 3 * C
-----------------------------------------------------------
Avg: 7 = 5.5 + 1.5 * C
run time = end time - arrival time + context switch time
--------------------------------------------------------
ProcA: 5 = 5 - 0 + 0 * C
ProcB: 6 = 7 - 1 + 1 * C
ProcC: 14 = 16 - 2 + 2 * C
ProcD: 17 = 20 - 3 + 3 * C
--------------------------------------------------------
Avg: 13 = 10.5 + 1.5 * C
response time = first output - arrival time + context switch time
-----------------------------------------------------------------
ProcA: 1 = 1 - 0 + 0 * C
ProcB: 5 = 6 - 1 + 1 * C
ProcC: 8 = 10 - 2 + 2 * C
ProcD: 16 = 19 - 3 + 3 * C
-----------------------------------------------------------------
Avg: 9 = 7.5 + 1.5 * C
First come first serve is a reasonable attempt to solve the scheduling problem but it suffers from a particular problem that makes it brittle. Say the first process run has an extremely long run time. In fact, say we have the following processes:
Process Run Time Arrival Time Time To Response
------- -------- ------------ ----------------
A 20 0 1
B 3 1 1
C 1 2 3
D 2 3 3
Even though processes B, C, and D can finish in a brisk three time units they all have to wait for A to finish first.
An alternative scheduling policy is to run the process with the shortest run time instead. This leads to the shortest job first (SJF) policy. The policy behaves according to its name by selecting the job with the shortest run time. This leads to a lower average wait time.
Let's look at how this policy affects our first example:
Process Run Time Arrival Time Time To Response
------- -------- ------------ ----------------
A 5 0 1
B 2 1 1
C 9 2 3
D 4 3 3
Time: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
------------------------------------------------------------------
ProcA: [ ! - - ]
ProcB: A . . . [ !]
ProcC: A . . . . . . . . [ - - ! - - - - ]
ProcD: A . . . [ - - !]
wait time = start time - arrival time + context switch time
-----------------------------------------------------------
ProcA: 0 = 0 - 0 + 0 * C
ProcB: 4 = 5 - 1 + 1 * C
ProcC: 9 = 11 - 2 + 2 * C
ProcD: 4 = 7 - 3 + 3 * C
-----------------------------------------------------------
Avg: 5.75 = 4.25 + 1.5 * C
FCFS: 7 = 5.5 + 1.5 * C
run time = end time - arrival time + context switch time
--------------------------------------------------------
ProcA: 5 = 5 - 0 + 0 * C
ProcB: 6 = 7 - 1 + 1 * C
ProcC: 18 = 20 - 2 + 2 * C
ProcD: 8 = 11 - 3 + 3 * C
--------------------------------------------------------
Avg: 10.75 = 9.25 + 1.5 * C
FCFS: 13 = 10.5 + 1.5 * C
response time = first output - arrival time + context switch time
-----------------------------------------------------------------
ProcA: 1 = 1 - 0 + 0 * C
ProcB: 5 = 6 - 1 + 1 * C
ProcC: 12 = 14 - 2 + 2 * C
ProcD: 7 = 10 - 3 + 3 * C
-----------------------------------------------------------------
Avg: 7.75 = 6.25 + 1.5 * C
FCFS: 9 = 7.5 + 1.5 * C
Notice that shortest job first improves nearly every average process metric. Unfortunately it comes with its own set of problems.
This policy, due to its greedy nature, is intuitively unfair. Here it is clear that long running processes are unfairly treated. In fact, if new processes keep coming in with shorter run times than the processes waiting then the longer running processes in the queue may never get a chance to run. This effect is known as starvation where a process is starved of the resources (CPU time) needed to finish its task.
Another issue with shortest job first is that some, or more likely most, processes don't know how long they will run. This is especially true for interactive processes where the run time is dependent on on user activity. Without explicit run times shortest job first cannot work. Instead the scheduler can attempt to estimate such run times but this adds additional complexity to the scheduler that will use CPU time that can be spent on the processes themselves. Estimating process run times is also a non-trivial task.
So far we have allowed every process to run until it is finished before running another process. This leads to clear vulnerabilities such as processes being starved or a running process entering an infinite loop. If this happened no other process would be able to finish. This can be overcome using scheduling preemption.
Scheduling preemption involves the operating system to periodically interrupt the currently running process in order to select a new process to run. The amount of time between each preemptive interrupt is called a quantum. Scheduling preemption in combination with some way of determining which process to run next leads to other scheduling polices. For example, combining scheduling preemption with first come first serve selection leads to what is known as round robin (RR).
In round robin a process is run until it either finishes or gets preempted by the operating system. That process then gets put into the back of the process queue and the next process in the queue will begin running. Given a set of processes this policy will eventually run each process thus avoiding starvation.
However, if new processes arrive what should the scheduler do? There are two primary options the scheduler has: put the new process at the front of the queue or the back of the queue. If the schedule places the new process at the front of the queue then we are once again susceptible to starvation. An onslaught of new processes may cause the processes at the back of the queue to never reach the front. Placing new processes at the back of the queue is sufficient to avoid this problem.
Let's look at how this policy affects our first example assuming our preemption occurs with a quantum of one:
Process Run Time Arrival Time Time To Response
------- -------- ------------ ----------------
A 5 0 1
B 2 1 1
C 9 2 3
D 4 3 3
Time: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
------------------------------------------------------------------
Q /|\ A5 B2 A4 C9 B1 D4 A3 C8 D3 A2 C7 D2 A1 C6 D1 C5 C4 C3 C2 C1
u | A4 C9 B1 D4 A3 C8 D3 A2 C7 D2 A1 C6 D1 C5
e | B1 D4 A3 C8 D3 A2 C7 D2 A1 C6 D1
u | A3 C8
e
Here the time line is slightly different. The job queue is represented vertically with the next job to run at the top. Notice that after each quantum (here 1
) the job that ran goes to the end of the queue. New jobs are inserted at the end of the queue prior to the running job being punted to the back of the queue. The number next to the job is the amount of run time left in the job. The resulting process run order is as follows:
A B A C B D A C D A C D A C D C C C C C
Note the inherent fairness of the policy. Ignoring the context switch time the average wait time is also less than first come first serve and shortest job first:
wait time = start time - arrival time
-------------------------------------
ProcA: 0 = 0 - 0
ProcB: 1 = 2 - 1
ProcC: 1 = 3 - 2
ProcD: 2 = 5 - 3
-------------------------------------
Avg: 1
FCFS: 5.5
SJF: 4.25
However, once context switching is taken into account round robin runs into issues. In the sequence we have above there are a total of fifteen context switches for twenty quantum of actual process running. Thus our utilization is 20 / (20 + 15 * C)
. The context switch time will also inflate our run time and response time metrics.
In Linux, process are scheduled by priority. Priority based schedulers calculate some priority metric which is usually a combination of a number of factors such as niceness, total CPU time, time since last run, etc. Priorities with a low numeric number are chosen first so a priority 1 job gets done before a priority 2 job.
Niceness is one of the factors in a processes priority. It is simple stored in the process table. A process with high niceness is more willing to "yield" quantum slots to other processes. This factor can be controlled in Linux using the nice
command.
If you precede a command with nice
then that command will run with an added niceness. The nice
program will essentially run, change its niceness, and then run exec
to execute the desired program, thus replacing the process, with its niceness already modified, with the desired program.
The amount of change in niceness can be specified. For example, nice -9 sort
runs sort with an added 9 niceness. A process would then be made meaner using nice --9 sort
. Note though that this is a privileged operation. Users are allowed to make their processes nicer but cannot make them meaner. The root user can make anything nicer or meaner.
The schedulers covered thus far can be put into a simple taxonomy involving two orthogonal axes: when decisions are made and how decisions are made. The "when" examples we have encountered thus far are when a process has finished and on a periodic cycle using preemptive scheduling. The "how" essentially covers how the priority is determined, whether it is by a single dimension (e.g. job length, queue position, etc.) or by a complex combination of dimensions (e.g. involving niceness, wait time, etc.).
Policy | When decision is made | How decision is made |
---|---|---|
FCFS | No process running or just finished | Next process in the queue |
SJF | No process running or just finished | Process in queue with shortest run time |
RR | Periodically or none running | Next process in queue |
Preemptive SJF | Periodically or none running | Process in queue with shortest run time |
Realtime applications place much more stringent conditions on the scheduler. Example applications include the software for controlling a nuclear power plant. In these scenarios sometimes everything needs a high priority. The traditional Linux scheduler is "best effort" which is not good enough for a "hard realtime situation." In these situations you cannot miss a deadline.
The constraints on these realtime operating systems place a premium on predictability. As such schedulers in these systems will often use a polling architecture instead of interrupts because interrupts are unpredictable and polling is predictable. The predictability allows for more accurate estimations for run times. Notice though that predictability comes at the cost of performance. In these critical situations the cost is worthwhile.
Another design choice as a result of this tradeoff is the disabling of cache. Once again cache results in unpredictability such as whether desired data may be in cache or not. By disabling cache all memory operations require the same amount of time thus providing a predictable platform from which to estimate run time.
Designing and implementing software systems for a realtime environment is a very tricky task and is more of a black art than a science.
Other realtime applications have softer requirements where some jobs may be dropped or cancelled if the system gets too busy. These would be jobs that can be cancelled without too much cost. An example of such a situation is in video playback. A single missing frame in a video stream is not too detrimental to the system achieving its goal.
An example of a scheduler policy for the soft realtime domain is earliest deadline first (EDF). Additional behavior can be added to this basic policy such as monotonic scheduling and proportionate job dropping.
Scheduling is a generic problem for whenever you have a limited resource with many entities desiring it. Other schedulers will show up later in the course for scheduling things other than the CPU. For example, the network interface is a limited resource so requests will need to be scheduled.
The scheduler is interested in optimizing performance of the system, but it is a part of the system itself. An scheduler that optimizes perfectly but eats up all of the CPU resources in the process is not a very useful scheduler. Thus a trade off exists between the complexity of a scheduler and its own running efficiency.
A complex scheduler that takes into account many factors to fairly and efficiently schedule jobs may be optimal in some or many metrics but there is additional overhead in accounting for so many factors. Such factors may have to be stored in the process table as well causing the process table size to grow. On the other end of the spectrum a scheduler such as FCFS is extremely fast. In fact it should run in constant time. However, according to many metrics it is not only not optimal but in fact very poor.
Synchronization is the biggest problem in multi-threaded applications. The default behavior for threads is unsynchronized. This leads to race conditions.
Race conditions occur when the behavior of the program is dependent on whether one thread executes first/faster than another. They are notoriously difficult to debug primarily because they are *intermittent. This means that reproducing the error is inconsistent and sometimes impossible (or at least highly unlikely). There needs to be a methodology for avoiding race conditions efficiently, unintrusively, and clearly.
The canonical example of a race condition involves a shared bank account. Consider the following routines to deposit and withdraw from an account:
bool deposit(int amt) {
if (!(amt >= 0 && amt <= INT_MAX-balance)) return false;
balance += amt;
return true;
}
bool withdraw(int amt) {
if (!(amt >= 0 && amt > balance)) return false;
balance -= amt;
return true;
}
The above code looks suitable but it contains a race condition. Even in a single threaded application there is a race condition due to the possibility of signals. The race condition here exists because the +=
operation is not atomic. Consider the following example where two threads are trying to simultaneously deposit.
Thread A Thread B balance b_a b_b
-------- -------- ------- --- ---
Read balance to b_a 440 440 UNDEF
Read balance to b_b 440 440 440
Add 100 to b_a 440 540 440
Add 40 to b_b 440 540 480
Set balance to b_a 540 540 480
Set balance to b_b 480 540 480
At the end of the above example the balance is 480
even though 140
total was deposited to 440
. Thus the balance should be 580
but 100
of it is missing due to the race condition.
It is possible to solve this problem using locks and mutual exclusions. However, consider a more complicated action the bank might want to take such as performing a computationally expensive audit that must look at lots of data. If the bank simply locked the entire database the bank would be unable to continue operating so long as the audit was occurring.
We want observable behavior to be serializable. One way to get serializability is to use atomicity. If you make an action atomic it either happens or it doesn't.
Atomicity will be covered in the next lecture.