Lecture 19: Scheduling/Cloud Computing

Anthony Pham, Huy Le

Scheduling

Schedule because of limited resources
- Too many hungry mouths to feed (glorified bouncer)
- Long term scheduling - which processes are admitted to the system?
- Medium term scheduling - which processes have a CPU?
- Short term scheduling - which process havea CPU
- For now assume a single level
Scheduling Metrics
Like a car factory:
arrival -> start building -> 1st model(1st output -> 2nd output -> ... -> done) -> next model
For processes:
fork -> exec -> 1st process (1st write -> 2nd write -> ... -> exit) -> context switch -> next model
- wait time - time between fork and exec
- response time - time between fork and first write
- turn around time (TT or latency) - time between fork and exit

Aggregate statistics

Maximize: utilization and throughput.
Utilization - % of CPU time doing useful work.
Throughput - rate of useful work being done (sum of run times/total time).
Minimize: latency, response time, wait time.
The two goals can conflict.

Average Scheduler times

Table of Jobs and their times. d is context switch time. | represents context switch.

Jobs	Arrival	Run time	Wait time
A	0	5	0+d
B	1	2	4+d
C	2	3	5+2d
D	3	4	13+3d

Simple Scheduler - First come first serve

AAAAA|BB|CCCCCCCCC|DDDD
utilization almost 1
avg wait time = 5.5 +2.5d
avg TT = 10.5 + 2.5d

Shortest Job First

AAAA|BB|DDDD|CCCCCCCCC
utilization = 20/(20 + 4d)
avg wait = 4.25 + 2.5d

SJF + preemption

A|BB|DDDD|AAAA|CCCCCCCCC
avg wait = 2.25 + 2.25d
abg run = 2.25 + 2.5d + 5 + (6 + 5d/4)
avg wait time went down, avg run time went up
(std dev/worst case) of (wait/run) <= fairness metric

FCFS + preemption = round robin

(buggy)
|A|B|C|D|A|B|C|D|A|C|D|A|C|D|A|CCCCC
avg wait time = 2.5d
avg TT = 12.25 + 11.24d
need to put new jobs at end of the queue to prevent large job starvation

Priorities - set by user or by system
FCFS priority: -arrival time
SJF priority: -run time

Real time scheduling - constraints that must be hit, or else.

hard real time: needs to work immediately(brakes, control software)
soft real time: earliest meetable deadline first (video player, if deadline missed, drop task)

Mars rover had 2 priorities: high, medium, and low. There is a bug with its priorities where
if a low priority task had a lock, and a high priority task has to wait for it, if a medium task
runs forever, since it won't have to use the lock, it takes over, leaving the high priority task
to hang in a problem known as priority inversion.

We fix this by temporarily having the high priority task lend its priority to the low priority task
with the lock so that it can finish.

More Scheduling

1 CPU, 1 Disk, many I/O requests. Which to do first?
A simple abstraction of a disk is an array, where the disk head is in one element, and the where it wants
to go is represented by another. The cost to access the array is the distance in the array away from
where the head is. Average seek time is between 1/4 and 1/2 the size of the array.
Simplest disk scheduler: FCFS

- lots of seeking
+ no starvation

Shortest seek time first (SSTF)

+ minimizes seek time
+ maximizes throughput
- more CPU time (not important because CPU is cheap compared to seek time)
- starvation

Combine FCFS (for batches) + SSTF (for within a batch).

Elevator scheduling - SSTF in a positive direction.
Fairer would be cyclic elevator scheduling in which it goes in a cycle instead of top to
bottom to top. However, it would have less throughput.
Anticipatory scheduling - tries to guess what the next task will do, minimizing seeks.
In some case, it won't read and DALLY for the next read, anticipating that it will have spacial locality.

Most stuff here won't work with flash.

Cloud Computing Intro

Advantages

large shared resource
economies of scaling

Customers

short term commitment
can grow computing needs as needed "instantly"
unknown demand
varying demand

Disadvantages

payments - OS must meter the activities in way that is fair for customer and suppliers
resource management
security - data confidentiality, can solve by encrypting connections and data, but it affects performance
software licensing
data transfer - bottleneck
vendor lock in - once you choose one, you are stuck

Trust is the biggest problem we face, because in the end we have to trust the people building it.