Lecture 9: Synchronization Primitives & Deadlock

Scribe Notes - April 28, 2014

CS 111, Spring 2014, Prof. Paul Eggert

Critical Sections, Atomicity, and Locks

Suppose we have a critical section that we want atomicity for. One option is to have a global variable around the section that tells programmers to watch out. We would put an exclamation mark at the end, but the language doesn't allow it.


  int watchout;

  watchout = 1;
  acct1 += amt;
  acct2 -= amt;
  watchout = 0;
    

This global variable has issues if we have multiple threads. We can try the following:


  int watchout;

  while(watchout)
    continue;
  watchout = 1;
  acct1 += amt;
  acct2 -= amt;
  watchout = 0;
    

But this causes a race condition if watchout is tested twice at the same time. We would like to build code to protect our critical section regardless of threads. To do so, we ask for help from the hardware.

Implementing the equivalent of watchout on the hardware level, x86 has several instructions for atomic actions. Here's one of them:


  lock incl x;
    

The lock prefix for this instruction is atomic, and makes the process only for this CPU. This is good for allowing atomicity, but when running it you can't use cache, so only use these atomic instructions when necessary. Typically we would also wrap the assembly instruction in a C function.

There are about a dozen of these x86 atomic instructions. Two more are test-and-set, and exchange. Let's look at exchange in more detail.


  lock xchgl x, %eax;
    

An analog of its functionality in C is this:


  int exchange(int *p, int newval){
    int oldval = *p;
    *p = newval;
    return oldval;
  }
    

However, in reality we would instead use lock xchgl with C’s inline assembler.

Relying on this magic exchange function, we can implement a type for a lock and functions to lock and unlock it.


  typedef int volatile lock_t;

  void lock(lock_t *p){
    while(exchange(p,1))
      continue;
  }

  void unlock(lock_t *p){
    *p = 0;
  }
    

Precondition for unlocking:

Precondition for locking:

Critical section code has two basic properties:

  1. Mutual exclusion
  2. Bounded waiting: you always eventually get into the critical section

Suppose your code gets a lock, then goes to mine bitcoins while others wait. This will just anger other programmers — keep it simple stupid (KISS)!

Here's a low level design problem: all CPUs try walking in the same area, causing a bottleneck that doesn't scale (1 thread causes N-1 to busy wait). Can we avoid locks here? Yes, there are ways of doing so. A critical section can be so simple it's atomic already. Some things you can do are global variable declarations and simple assignments.


  int x;

  ...

  { int i = x; }
    

If a thread A assigns 27 to x, and thread B assigns 192736 to x, we know it is one or the other. It's atomic, all or nothing.

Be warned that assignments may fail if a variable type is too long:


  long long x;
    

... Or if it’s too short:


  struct {
    unsigned int a:7; // a is 7 bits long
    unsigned int b:1; // b is 1 bit
  } x;
    

For struct x, we cannot expect atomicity for different threads accessing or assigning to x.b. There may even be problems if a thread uses x.a, as all eight bits could be loaded at once.

Locks and Pipes

Suppose we have the implementation of a pipe, which has a buffer.


  #define N 1024

  struct buf{
    char buf[N];
    size_t r, w;
  };

  char readpipe(struct pipe *p){
    while(p->r == p->w)
      continue;
    return p->buf[p->r++ % N];
  }

  void writePipe(struct pipe *p, char c){
    while(p->w - p->r == N)
      continue;
    p->buf[p->w++ % N] = c;
  }
    

This code currently is not synchronized. We want to add locking to make pipe reads and writes atomic. To do so, we declare a global variable lock_t l. We can then lock and unlock.


  char readpipe(struct pipe *p){
    lock(&l);
    while(p->r == p->w)
      continue;
    unlock(&l);
    return p->buf[p->r++ % N];
  }

  void writePipe(struct pipe *p, char c){
    lock(&l);
    while(p->w - p->r == N)
      continue;
    p->buf[p->w++ % N] = c;
    unlock(&l);
  }
    

But in doing this, we've introduced some severe performance bugs. The while loop after locking causes an infinite loop without unlocking. We can change the while loop to unlock and then lock again.


  char readpipe(struct pipe *p){
    lock(&l);
    while(p->r == p->w)
    {
      unlock(&l);
      lock(&l);
    }
    unlock(&l);
    return p->buf[p->r++ % N];
  }

  void writePipe(struct pipe *p, char c){
    lock(&l);
    while(p->w - p->r == N)
    {
      unlock(&l);
      lock(&l);
    }
    p->buf[p->w++ % N] = c;
    unlock(&l);
  }
    

This is better, but we still have performance issues from spending time locking excessively. Yielding to another process can help.


  char readpipe(struct pipe *p){
    lock(&l);
    while(p->r == p->w)
    {
      unlock(&l);
      yield();
      lock(&l);
    }
    unlock(&l);
    return p->buf[p->r++ % N];
  }

  void writePipe(struct pipe *p, char c){
    lock(&l);
    while(p->w - p->r == N)
    {
      unlock(&l);
      yield();
      lock(&l);
    }
    p->buf[p->w++ % N] = c;
    unlock(&l);
  }
    

But suppose we have a large application with 250 or so pipes. If we want to write into to the pipes at once, our current scheme is too conservative - its coarse-grained locking presents a bottleneck. We want to be able to independently read or write each pipe. Instead, we can make lock_t l a member to the pipe, locking via lock(&p->l) (Note that this syntax is valid because the & operator binds stronger than arrow (->) operator). What we have is now more parallelizable, but we have more locks.

Also, for full or empty pipes we need something better than yield to avoid excessive spinning. After yielding, we would like to only wake when we have something. For this, we’ll use a blocking mutex.


  typedef struct buf{
    lock_t l; // This is a spin lock.
    bool locked;
    threat_t *blocked_list;
  } bmutex_t;

  void acquire(bmutex_t *b){
    while(lock(&b->l), b->locked){
      thread_t *l = b->blocked_list;
      b->blocked_list = self;
      self->next = l;
      self->runnable = false;
      unlock(&b->l);
      yield(next);
    }
    b->locked = true;
    unlock(&b->l);
  }

  void release(bmutex_t *b){
    lock(&b->l);
    b->locked = false;
    thread_t *next = b->blocked_list;
    if (next){
      next->runnable = true;
      b->blocked_list = next->next;
    }
    unlock(&b->l);
    yield(next);
  }
    

We're still spinning some here, though. What if we had a condition variable, to represent the condition an application's interested in, such as “Is the pipe empty?” or “Is the pipe full?” The OS gives us a way to enforce conditions, and we can use condition variables along with blocking mutexes. For example, we can have a function notify that we call when its condition is satisfied. For pipes, in the read function we would notify when non-empty, and in the right function when non-full.

Overall we've addressed race conditions and performance bugs above, but we still have yet to discuss deadlock.

Appendix: Semaphore

A semaphore is a blocking mutex with arity > 1. A blocking mutex is a special case of a sempahore, also known as a "binary sempahore".

Scribe notes prepared by Matthew Goldberg and Jeffrey Wang