Lecture 9 Scribe Notes - Computer Science 111, Winter 2012

Chris Rewinski
Lecture by Paul Eggert on February 8, 2012

Lecture 9 - Synchronization Primitives and Deadlock

I. Synchronization Primitives
1. 1. Implementing Low-Level Primitives
2. 2. Building Higher-Level Locking Primitives
  1. A. Read and Write Locks
  2. B. Blocking Mutexes
  3. C. Condition Variables
II. Deadlock
1. 1. An Example
2. 2. Eliminating Deadlock
  1. A. Making Circular Wait Impossible
  2. B. Lock Ordering and Numbering Resources

I. Synchronization Primitives

1. Implementing Low-Level Primitives

A. A First Implementation

typedef bool lock_t;

void lock(lock_t *l) {
	while (*l)
		continue;
	*l = true;
}

void unlock(lock_t *l) {
	*l = false;
}

Race condition here! We need a way to lock our lock so that locking and unlocking execute atomically. Let's turn to the x86 instruction set.

B. Using the x86 Instruction Set

lock incl x

Hardware implementation:
1) Grab the bus
2) incl x
3) Release the bus

Implementation of locks is possible, but difficult. Not what we need.

xchgl r,x

Known as "test and set."
Exchanges the contents of r and x atomically.
Values must be aligned.

C. A Second Implementation

int xchgl(int *p, int new)
{
	/* This is what happens:
		int old = *p;
		*p = new;
		return old;
	*/
	asm("xchgl ...");
}

typedef int lock_t;

// Precondition: we don't own lock
void lock(lock_t *l) {
	while (xchgl(l,1) != 0)
		continue;
}

// Precondition: we own lock
void unlock(lock_t *l) {
	xchgl(l,0);
}

D. An Aside: Adding Two Atomically

What if we wanted to add 2 to a variable atomically?
Using lock incl x twice will lead to an intermediate state, which introduces a race condition.
We could lock, add 2, and unlock, but this is slow.
Instead we use another assembly instruction: cmpxchgl (compare and swap)

bool cmpxchgl(int *p, int old, int new) {
	/* This is what happens:
		if (*p == old) {
			*p = new;
			return true;
		} else
			return false;
	*/
	asm("cmpxchgl ...");
}

Something to be careful of:
The call cmpxchgl(&v, v, v+2) introduces a race condition! The values of v in parameter evaluation might not be the same.
So, cache v locally (assuming atomic loads and stores, which is true on x86 for aligned ints):

do {
	int old = v;
} while (!cmpxchgl(&v, old, old+2));

The computation of the new value must be relatively fast, or the loop might go on forever because v changes faster than we can evaluate its new value.

2. Building Higher Level Locking Primitives

A. Read and Write Locks

struct rwlock_t {
	lock_t l;	// This lock protects the struct's contents
	int readers;
}

We can use the spinlock as a write lock, since only one writer is allowed at a time.

void lock_for_write(struct rwlock *l) {
	for (;;)
		lock(&l->l);
		if (!l->readers) break;
		unlock(&l->l);
	}
}

void unlock_for_write(struct rwlock *l) {
	unlock(&l->l);
}

void lock_for_read(struct rwlock *l) {
	lock(&l->l);
	l->readers++;
	unlock(&l->l);
}

void unlock_for_read(struct rwlock *l) {
	lock(&l->l);
	l->readers++;
	unlock(&l->l);
}

B. Blocking Mutexes

Remember our pipe implementation? We utilized spinlocks:

struct pipe {
	lock_t l;
	char buf[N];
	size_t r,w;
}

void writec(struct pipe *p, char c) {
	for (;;) {
		lock(&p->l);
		if (p->w - p->r < N) break;
		unlock(&p->l);
	}
	p->buf[p->w++%N] = c;
	unlock(&p->l);
}

And similarly for readc.

The problem with spinlocks: we busy wait, which wastes CPU time.
A blocking mutex allows us to release control of the CPU to someone else while we wait for the lock.

struct bmutex {
	lock_t l;
	bool locked;
	proc_t *block_list;	// Linked list of process descriptors waiting for this lock
}

// Assume that there's a proc_t *self for the current process
void acquire(struct bmutex *p) {
	for (;;) {
		lock(&p->l);
		add(&p->block_list, self);
		self->blocked = true;		// Handwaving: tell the OS that we are blocked, so it doesn't reschedule us
		if (!p->locked) break;
		unlock(&p->l);
		schedule();			// Gives up CPU control
	}
	p->locked = true;
	remove(&p->block_list, self);
	self->blocked = false;
	unlock(&p->l);
}

void release(struct bmutex *p) {
	lock(&p->l);
	// Set 1st process in the blocked list to runnable
	if (p->block_list)
		p->block_list->blocked = false;
	p->locked = false;
	unlock(&p->l);
}

C. Condition Variables

We can use blocking mutexes for the pipe implementation, but we still busy wait for the pipe to read or written.
We want to block until the pipe is ready for us to read/write.
In other words, we want to unblock based on a specific boolean expression.

Condition variables consist of three parts:
1. An arbitrary boolean expression
2. A blocking mutex to protect the condition variable
3. Storage for the condition variable itself

API:

wait_for(condvar_t c, bmutex_t b);	b must have been acquired already release b block until some thread notifies us (see below) reacquire b
notify(condvar_t *c);	notifies one waiting thread
notifyAll(condvar_t *c);	notifies all waiting threads

Now we have a final pipe implementation, using blocking mutexes and condition variables:

struct pipe {
	...
	bmutex_t l;
	condvar_t nonfull, nonempty;
}

void writec(struct pipe *p, char c) {
	acquire(&p->l);
	while(p->w - p->r == N)
		wait_for(&p->nonfull, &p->l);
	p->buf[p->w++%N] = c;
	notify(&p->nonempty);
	release(&p->l);
}

And similarly for readc.

II. Deadlock

When one or more processes are waiting for events which never occur.

1. An Example

A common task is copying data from file descriptor to another file descriptor:

for (;;) {
	read(ifd, buf, sizeof(buf));
	write(ofd, buf, n);
}

For synchronization, read:
- Acquires the lock on ifd
- Copies data
- Releases the lock on ifd

And write:
- Acquires the lock on ofd
- Copies data
- Releases the lock on ofd

Why not have a function copy(ifd, ofd, num)?

For synchronization, copy would:
- Acquire the lock on ifd
- Acquire the lock on ofd
- Copy data
- Release the lock on ofd
- Release the lock on ifd

But what if we had two processes, P1 and P2:
P1: copy(ifd, ofd, ...);
P2: copy(ofd, ifd, ...);

If P1 acquired the lock on ifd, then P2 acquired the lock on ofd, both processes would block forever.
P1 won't give up ifd's lock until it gets ofd's lock, but P2 won't give up ofd's lock until it gets ifd's lock.

2. Eliminating Deadlock

Deadlock is a race condition which occurs due to 4 conditions:
- Circular wait
- Mutual exclusion
- No preemption of locks
- Hold and wait (a process can hold a resource while waiting for another)

Eliminating any of the above conditions eliminates deadlock.

A. Making Circular Wait Impossible

System maintains a directed graph of processes and resources. The edge P1->R1 exists if P1 owns R1. The edge R1->P2 exists is P2 is waiting for R1.
If the system detects a potential loop from a requested blocking wait, it refuses to allow that process to wait.

B. Lock Ordering and Numbering Resources

Processes always acquire resources in order. This condition is necessary for fairness.
If a process can't acquire a resource, it releases all resources and starts over.
This eliminates the condition of hold and wait.