Lecture 19: Scheduling/Cloud Computing
Anthony Pham, Huy Le
Scheduling
-
Schedule because of limited resources
- Too many hungry mouths to feed (glorified bouncer)
- Long term scheduling - which processes are admitted to the system?
- Medium term scheduling - which processes have a CPU?
- Short term scheduling - which process havea CPU
- For now assume a single level
-
Scheduling Metrics
Like a car factory:
arrival -> start building -> 1st model(1st output -> 2nd output -> ... -> done) -> next model
For processes:
fork -> exec -> 1st process (1st write -> 2nd write -> ... -> exit) -> context switch -> next model
- wait time - time between fork and exec
- response time - time between fork and first write
- turn around time (TT or latency) - time between fork and exit
Aggregate statistics
Maximize: utilization and throughput.
Utilization - % of CPU time doing useful work.
Throughput - rate of useful work being done (sum of run times/total time).
Minimize: latency, response time, wait time.
The two goals can conflict.
Average Scheduler times
Table of Jobs and their times. d is context switch time. | represents context switch.
Jobs |
Arrival |
Run time |
Wait time |
A |
0 |
5 |
0+d |
B |
1 |
2 |
4+d |
C |
2 |
3 |
5+2d |
D |
3 |
4 |
13+3d |
Simple Scheduler - First come first serve
AAAAA|BB|CCCCCCCCC|DDDD
utilization almost 1
avg wait time = 5.5 +2.5d
avg TT = 10.5 + 2.5d
Shortest Job First
AAAA|BB|DDDD|CCCCCCCCC
utilization = 20/(20 + 4d)
avg wait = 4.25 + 2.5d
SJF + preemption
A|BB|DDDD|AAAA|CCCCCCCCC
avg wait = 2.25 + 2.25d
abg run = 2.25 + 2.5d + 5 + (6 + 5d/4)
avg wait time went down, avg run time went up
(std dev/worst case) of (wait/run) <= fairness metric
FCFS + preemption = round robin
(buggy)
|A|B|C|D|A|B|C|D|A|C|D|A|C|D|A|CCCCC
avg wait time = 2.5d
avg TT = 12.25 + 11.24d
need to put new jobs at end of the queue to prevent large job starvation
Priorities - set by user or by system
FCFS priority: -arrival time
SJF priority: -run time
Real time scheduling - constraints that must be hit, or else.
- hard real time: needs to work immediately(brakes, control software)
- soft real time: earliest meetable deadline first (video player, if deadline missed, drop task)
Mars rover had 2 priorities: high, medium, and low. There is a bug with its priorities where
if a low priority task had a lock, and a high priority task has to wait for it, if a medium task
runs forever, since it won't have to use the lock, it takes over, leaving the high priority task
to hang in a problem known as priority inversion.
We fix this by temporarily having the high priority task lend its priority to the low priority task
with the lock so that it can finish.
More Scheduling
1 CPU, 1 Disk, many I/O requests. Which to do first?
A simple abstraction of a disk is an array, where the disk head is in one element, and the where it wants
to go is represented by another. The cost to access the array is the distance in the array away from
where the head is. Average seek time is between 1/4 and 1/2 the size of the array.
Simplest disk scheduler: FCFS
- - lots of seeking
- + no starvation
Shortest seek time first (SSTF)
- + minimizes seek time
- + maximizes throughput
- - more CPU time (not important because CPU is cheap compared to seek time)
- - starvation
Combine FCFS (for batches) + SSTF (for within a batch).
- Elevator scheduling - SSTF in a positive direction.
- Fairer would be cyclic elevator scheduling in which it goes in a cycle instead of top to
bottom to top. However, it would have less throughput.
- Anticipatory scheduling - tries to guess what the next task will do, minimizing seeks.
In some case, it won't read and DALLY for the next read, anticipating that it will have spacial locality.
Most stuff here won't work with flash.
Cloud Computing Intro
Advantages
- large shared resource
- economies of scaling
Customers
- short term commitment
- can grow computing needs as needed "instantly"
- unknown demand
- varying demand
Disadvantages
- payments - OS must meter the activities in way that is fair for customer and suppliers
- resource management
- security - data confidentiality, can solve by encrypting connections and data, but it affects performance
- software licensing
- data transfer - bottleneck
- vendor lock in - once you choose one, you are stuck
Trust is the biggest problem we face, because in the end we have to trust the people building it.