Our goals for synchronization primitives:
A brief history of development:
It's always a hassle to make something efficient enough for locking. The locking primitives, as we'll see in this lecture, tend not to scale well. What was good in the past usually isn't good enough now.
To have any hope of making a good set of locking primitives, we need atomic loads and stores
First approach:
typedef int lock_t;
// Method is no good
void lock (lock_t *l)
{
while (*l)
continue;
*l = 1;
}
A slightly better approach:
# %eax = 1
incl (%eax) # <--- Avoids races on one CPU
# Internally, this is a load + add + store, so 2 CPUs wouldn't help
# alternative:
lock incl (%eax) # This avoids the race. "Lock" means to execute
# instruction atomically
void lock_incl (int *l)
{
asm("lock incl %...."); // write the assembly in C code
}
// Still not sufficient for a locking primitive
A better solution is a new primitive: xchgl
// returns old value
int xchgl (int *l, int value) // takes in new value
{
asm("....");
return // ...
}
void lock(lock_t *l)
{
while (xchl(&l, 1) == 1)
continue;
// And this will work!
}
void unlock(lock_t *l)
{
*l = 0; // Safe, because no two people can run lock at the same time
}
We want to be able to a new type of lock. Instead of just incrementing a long, we want to be able to lock other operations. We want to lock any mathematical function (i.e. multiply by 3, divide by 2, etc.)
// "aa" == "apply atomically"
void aa(int *p, int *l, int (*f) (int))
{
lock(l); // l is a lock location passed in by user
*p = f(*p); // Buggy, two threads could store into *p at
// the same time
unlock(l);
// But can we do something more efficent?
}
void aa(int *p, int *l, int (*f) (int))
{
do {
int v = *p;
int r = f(v);
} while (xchg (p, r) != v)
/* Bug: if we have the wrong r value, then our lock is messed up */
}
bool cmpxchgl (int *p, int old, int new)
{
// address of location, old value, and new value to set it to
// Won't change memory if old value is wrong
if (*p == old)
{
*p = new;
return true;
}
else
return false;
}
// This version of aa is sufficient
void aa(int *p, int (*f) (int) )
{
do {
int old = *p;
int new = f(old);
} while (!cmpxchgl (p, old, new) );
// cmpxchgl is a bottleneck on multicore systems (1000 cores)
}
This was developed last year by Intel
movl $1, %eax
# l is memory location
# "xaquire" is new
try:
xacquire lock xchl %eax, l
cmp $0, %eax
je try
# We have the lock
xrelease movl, $0, l
How does xacquire
work?
xrelease
magically jumps back to the "try" section to try again
struct pipe {
size_t r, w;
char buf[BUFSIZE];
lock_t l;
};
void write (struct pipe *p, char c)
{
for (;;)
{
lock(&p->l);
if (p->w - p->r != BUFSIZE)
break;
unlock(&p->l);
yield();
}
p->buf[p->w++%BUFSIZE] = c;
unlock(&p->l);
}
We want something more efficient. We want something like, "yield until this resource is available". Such a resource is called a blocking mutex
If we don't get the lock, put ourselves to sleep until it is available
struct process {
// other stuff
struct process *next;
};
// typedef lock_t bmutex_t; // Not good for waiting
// Instead
typedef struct {
lock_t l;
struct process *p; // linked list of processes waiting on lock
struct process **last; // to append to end quickly
bool acquired;
} bmutex_t;
// Acquire a lock or put self to sleep
void acquire (bmutex_t *b) {
while (1)
{
lock (&b->l);
if (b->acquired) {
b->p = cp
cp->blocked = true;
yield();
continue;
}
break;
}
b->acquired = true;
unlock (&b->l);
}
// Undo an acquire
void release (bmutex_t *b) {
lock(&b->l);
b->acquired = false;
b->p = p->next;
p->blocked = false;
unlock(&b->l);
}
Our previous approach is susceptible to this.
If we carefully protect everything with locks, and eliminate all other races, we can still have deadlock! This is even harder to test for.
All four of these need to be met on a system for deadlock to occur. If even one is broken, then deadlock cannot occur.
How to solve the deadlock problem: attack at least one of the 4 conditions above:
Solutions (number denotes which condition we attack)
&l
< &m
)
aa()
functionMethod 1 is the most important. This is the approach we'll take.
Example
parent:
pipe()
pipe()
fork()
exec sort # sort won't deadlock because sort won't output until it reads
# EVERYTHING. Calling gzip, on the other hand, would hang.
# It's only because of this specific program's implementation
# that this is safe.
write
write
close
read
read
read
Mars Pathfinder
T_low | T_med | T_high |
runnable | waiting | waiting |
acquire (&m) | ------------------------> | (runnable) |
acquire (&m) | (runnable) | <------ acquire (&m) |
computation |
Arrows denote the flow of execution between threads. The high priority thread tries to run, but the low priority thread is holding a lock it needs. The medium priority thread, however, does not need this lock. when this thread becomes runnable, it runs instead of the low priority thread.
Notice that the high priority task doesn't get run immediately, and this could result in the job being killed by a watchdog if the medium priority thread takes too long (i.e. a long computation, as indicated).
How can we solve this? A high priority process should lend its priority to the holder of the lock it's waiting on. That way the high priority process will not get preempted and this issue will not occur.