CS 111 Lecture 11/03/14

Nate Fischer & Ty Giacalone


Table of contents


Synchronization primitives

Our goals for synchronization primitives:

A brief history of development:

It's always a hassle to make something efficient enough for locking. The locking primitives, as we'll see in this lecture, tend not to scale well. What was good in the past usually isn't good enough now.

To have any hope of making a good set of locking primitives, we need atomic loads and stores

First approach:

typedef int lock_t;
// Method is no good
void lock (lock_t *l)
{
  while (*l)
    continue;
  *l = 1;
}

A slightly better approach:

#       %eax = 1
incl    (%eax) # <--- Avoids races on one CPU
# Internally, this is a load + add + store, so 2 CPUs wouldn't help

# alternative:
lock incl   (%eax)  # This avoids the race. "Lock" means to execute
                    # instruction atomically

void lock_incl (int *l)
{
  asm("lock incl %...."); // write the assembly in C code
}
// Still not sufficient for a locking primitive

A better solution is a new primitive: xchgl

// returns old value
int xchgl (int *l, int value) // takes in new value
{
  asm("....");
  return // ...
}

void lock(lock_t *l)
{
  while (xchl(&l, 1) == 1)
    continue;
  // And this will work!
}

void unlock(lock_t *l)
{
  *l = 0; // Safe, because no two people can run lock at the same time
}

Lock incl generalized

We want to be able to a new type of lock. Instead of just incrementing a long, we want to be able to lock other operations. We want to lock any mathematical function (i.e. multiply by 3, divide by 2, etc.)

"Apply atomically" approach

// "aa" == "apply atomically"
void aa(int *p, int *l, int (*f) (int))
{
  lock(l); // l is a lock location passed in by user
  *p = f(*p); // Buggy, two threads could store into *p at
              // the same time
  unlock(l);

  // But can we do something more efficent?
}

void aa(int *p, int *l, int (*f) (int))
{
  do  {
    int v = *p;
    int r = f(v);
  } while (xchg (p, r) != v)
  /* Bug: if we have the wrong r value, then our lock is messed up */
}

Another instruction: compare and swap

bool cmpxchgl (int *p, int old, int new)
{
  // address of location, old value, and new value to set it to

  // Won't change memory if old value is wrong
  if (*p == old)
  {
    *p = new;
    return true;
  }
  else
    return false;
}

// This version of aa is sufficient
void aa(int *p, int (*f) (int) )
{
  do {
    int old = *p;
    int new = f(old);
  } while (!cmpxchgl (p, old, new) );
  // cmpxchgl is a bottleneck on multicore systems (1000 cores)
}

The latest solution: Hardware lock elision (HLE)

This was developed last year by Intel

  movl    $1, %eax
  # l is memory location
  # "xaquire" is new
try:
  xacquire lock xchl    %eax, l
  cmp   $0, %eax
  je    try
  # We have the lock
  xrelease movl, $0, l

How does xacquire work?


Pipes and locking

struct pipe {
  size_t r, w;
  char buf[BUFSIZE];
  lock_t l;
};

void write (struct pipe *p, char c)
{
  for (;;)
  {
    lock(&p->l);
    if (p->w - p->r != BUFSIZE)
      break;
    unlock(&p->l);
    yield();
  }
  p->buf[p->w++%BUFSIZE] = c;
  unlock(&p->l);
}

We want something more efficient. We want something like, "yield until this resource is available". Such a resource is called a blocking mutex


Blocking Mutex

If we don't get the lock, put ourselves to sleep until it is available

struct process {
  // other stuff
  struct process *next;
};
// typedef lock_t bmutex_t; // Not good for waiting
// Instead
typedef struct {
  lock_t l;
  struct process *p; // linked list of processes waiting on lock
  struct process **last; // to append to end quickly
  bool acquired;
} bmutex_t;

// Acquire a lock or put self to sleep
void acquire (bmutex_t *b) {
  while (1)
  {
    lock (&b->l);
    if (b->acquired) {
      b->p = cp
      cp->blocked = true;
      yield();
      continue;
    }
    break;
  }
  b->acquired = true;
  unlock (&b->l);
}

    // Undo an acquire
    void release (bmutex_t *b) {
      lock(&b->l);
      b->acquired = false;
      b->p = p->next;
      p->blocked = false;
      unlock(&b->l);
    }

Deadlock (deadly embrace)

Our previous approach is susceptible to this.

If we carefully protect everything with locks, and eliminate all other races, we can still have deadlock! This is even harder to test for.

Four conditions are necessary for deadlock:

All four of these need to be met on a system for deadlock to occur. If even one is broken, then deadlock cannot occur.

  1. Circular wait
  2. Mutual exclusion
  3. No preemption of locks
  4. Hold and wait

How to solve the deadlock problem: attack at least one of the 4 conditions above:

Solutions (number denotes which condition we attack)

  1. Several ways:
    1. Ordering on lock (&l < &m)
      • If you need to grab more than one lock, grab them in ascending order.
      • Wait on the first lock. If someone has any other lock, release all the locks and start over
      • This depends on the user
    2. cycle detection
      • bipartite graph (resources and processes)
      • process ---> resource if it owns a resource
      • resource ---> process if process wants to acquire resource
      • if there is a cycle, you get EDEADLOCK
      • This is Linux's approach
      • This is implemented in the OS, not by the user
    3. Compare & contrast a and b:
        1. places burden on the developer
        1. might let deadlock occur and just error out
      • In practice, b's chains aren't very long
  2. Prohibit mutual exclusion. Use aa() function
  3. Allow preemption (not a disaster in general, good for some purposes)
  4. Make the rule that you cannot acquire a lock if you already have one

Method 1 is the most important. This is the approach we'll take.

Example



    parent:
      pipe()
      pipe()
      fork()
        exec sort # sort won't deadlock because sort won't output until it reads
                  # EVERYTHING. Calling gzip, on the other hand, would hang.
                  # It's only because of this specific program's implementation
                  # that this is safe.
      write
      write
      close
      read
      read
      read

Priority inversion

Mars Pathfinder

T_lowT_med T_high
runnable waiting waiting
acquire (&m) ------------------------> (runnable)
acquire (&m) (runnable) <------ acquire (&m)
computation

Arrows denote the flow of execution between threads. The high priority thread tries to run, but the low priority thread is holding a lock it needs. The medium priority thread, however, does not need this lock. when this thread becomes runnable, it runs instead of the low priority thread.

Notice that the high priority task doesn't get run immediately, and this could result in the job being killed by a watchdog if the medium priority thread takes too long (i.e. a long computation, as indicated).

How can we solve this? A high priority process should lend its priority to the holder of the lock it's waiting on. That way the high priority process will not get preempted and this issue will not occur.


Midterm Info