CS111 Scribe Notes
Lecture 10: 11/4/2013
Contributors: Tania De Pasquale, Stephen Hernandez, David Powell
Condition variables
What is a condition variable?
- A Boolean condition defining the variable (programmer's responsibility)
- A blocking mutex to protect the condition, the condition can only change when you own the mutex
- A condition variable of type condvar_t representing the condition
API:
wait(condvar_t *c, bmutex_t *b)
// Precondition: you have acquired b (already blocked)
// Action: releases b, then blocks until some other thread/process notifies the OS that the condition is true, then reacquires b and returns
(bmutex blocks the condition variable)
notify(condvar_t *c)
// Notifies 1 other thread that the condition is true
// Broadcast/notifyAll notifies all threads currently blocking
// -- Advantages of broadcast: application specific priorities (let the application decide who to wake up)
Example:
struct pipe {
bmutex_t b;
condvar_t nonfull, nonempty;
size_t r, w;
char buf[N];
}
void writec(struct pipe *p, char c) {
acquire(&p->b); //spinlock until we get lock: if we can't get block until we can
while (p->w - p->r == N) // not if condition in case something changed while waiting (race condition)
wait(&p->nonfull, &p->b);
p->buf[p->w++ % N] = c;
release(&p->b);
notify(&p->nonempty);
}
* wait / notify don't promise anything, they wake the process up so the process can check if it should do anything
Sephamores
What is a sephamore?
- A blocking mutex is a binary semaphore.
- A semaphore is like a blocking mutex, except it protects N instances of a resource instead of just 1
- e.g. create semaphore with N = 5
- Resources available if N > 0 and out of resources if N = 0
- Primitives:
- prolaag P (like acquire, which decrements if N > 0 otherwise waits)
- verilaag V (like release, increments N)
Hardware Lock Elision (HLE)
Released in June 2013:
- We want the effect of spin locks w/o the cycle delay
- Spin locks are pessimistic, can't do anything while it locks
- When you have a lot of people trying to access the lock, there are large gaps of delay between the lock and the work
Example:
xacquire lock xchgl %eax, lockvar //optimistic variant of xchgl (load without waiting)
// Under the assumption that the lock will work
lock:
try:
xacquire lock xchgl %eax, lockvar
cmp $0, %eax
jnz try
unlock:
xrelease movl $0, lockvar
// If we didn't get the lock, the operations vanish like an undo (expensive) and we try again
Deadlock
Sometimes deadlock can come about in unanticipated ways and is tricky.
say: want a fast cat
eg: sort foo | cat | tr ...
We would want to skip the buffer like this:
while(copy(0, 1, 40%)>0)
continue;
// The copy syscall should speed it up right?
// This introduces deadlock, however
// Acquires 0 then 1, if someone else does copy(1, 0, ..) acquires 1 then 0 = DEADLOCK!
Inside cat:
ssize_t r;
while ((r = read(0, buf, sizeof(buf))) > 0)
write(1, buf, sizeof(buf));
exit(0);
// OS -> buffer -> userspace
How to avoid deadlock:
- Don't lock [don't require exclusion]
- Don't allow >1 lock per process at a time
// If a process is holding a lock, then don't allow it to try to acquire another
- Lock ordering - "try acquire"
// Assume some lock ordering (in practice oftentimes the memory address of locks)
// Acquire locks in ascending order; if any lock is busy, then give up all locks and start over
- Dynamic detection of cycles (used in lab)
// If the new lock would create a cycle, reject it
// Actually very common [used by linux]
File System Performance
Problem: storage hierarchy
ALU |
(a few nanoseconds) (1 cycle) |
Registers |
(sim. to ALU) |
L1 cache |
(2 cycles) |
L2 cache |
(4-8 cycles) |
DRAM |
(100 cycles) |
Flash ssd |
|
Hard disk |
|
Mag tape library |
(5 mins latency) |
* Ideally disk and registers would be the same
* Why are the secondary devices slow? (SSD and HD are very slow)
Disk:
7200 rpm = 120 Hz
read-heads float over any track; if data is all in the same track, then don't need to move the read-head
To grab a random sector off disk:
- Accelerate read heads in direction of desired track
- Halfway there, decelerate
- "listen" for the correct track and settle down
- Wait for data to come under the head
- Copy data out of the sector and into the disk controller cache
- Transfer data from cache into CPU or RAM
//These are typically estimates based on previous disk implementation
Typical 1TB drive: (Seagate Barracuda ES2)
- 16 MiB cache, 7200 RPM
- rotation time: 8.333... ms
- average rotational latency: 4.1666... ms
- average read/write seek: 8.5/9.5 ms
//takes longer to write than to read because the read can be in the general area while the write better be right in the middle of the track
- track-to-track seek: 0.8/1.0 ms
- max internal xfer rate: 1.29 Gb/s // from disk to cache
- max sustained xfer rate: 0.93 Gb/s //rate at which it can transfer from disk to cache in long term (for grabbing all data)
- external xfer rate: 3 Gb/s //from cache into bus
- bytes/sector: 512/520/524/528 //with Linux just use 512
- typical: 12.5 W
- idle: 9 W
- MTBF (mean time between failures): 1.2 million hours
- AFR (annualized failure rate): .73% //if left on 24/7, .73% chance it will fail in a year
- Nonrecoverable sector read failure rate: 10^-15
SSD (Corsair Force GT):
- 60 GB
- 2.0 W max
- 1 million hours MTBF
- 280 MB/s (1.74 Gb/s) read
- 270 MB/s (1.7 Gb/s) write
* Flash wears out! You can only do so many writes.
* Issues: wear leveling, if there is a "hot block" wore out, use a different block in its place
for (;;) {
char c;
if (read (0, &c, 1) != 1)
break;
process(c);
}
// assuming worst possible implementation (6 HDD Steps above):
// Each read will take a long time
// avg 4.16 ms + 8.5 ms + [.0005/100MB/s xfer rate (too small to matter)]
Batching: take control of setup/wait and do once, then grab as much data as you can
- + do data in large batches so that the control overhead is less per byte
- + can improve performance by up to the batching factor
- - change your source code
Ex. Say 1024 read and not 1, assuming process time is insignificant & read is bottleneck, 1000x faster
Prefetching:
- OS guesses future reads & saves its guess in buffer
- Separate buffer of most recent sector read, then read grabs from this buffer
- Correct guesses = same performance enhance as batching
- Overhead can be done in parallel 8.5 + 4.166 ||
Dallying:
- + Kernel can wait for another write into the same/similar sectors before it actually writes
- + Sometimes this works especially if program will write again
- + Delaying work makes work more efficient
- - Problem: if you don't write, then it's just a postponement and wasted time
- - Things can go wrong - OS shuts off and you lose the "writes"
// can be a win to save the USB from wearing as quickly