CS 111 Lecture 11/03/14

Nate Fischer & Ty Giacalone

Synchronization Primitives
Lock incl generalized
The latest solution: Hardware lock elision (HLE)
Pipes and locking
Blocking Mutex
Deadlock (deadly embrace)
Priority inversion
Midterm Info

Synchronization primitives

Our goals for synchronization primitives:

safe
easy to use
efficient

A brief history of development:

It's always a hassle to make something efficient enough for locking. The locking primitives, as we'll see in this lecture, tend not to scale well. What was good in the past usually isn't good enough now.

To have any hope of making a good set of locking primitives, we need atomic loads and stores

First approach:

typedef int lock_t;
// Method is no good
void lock (lock_t *l)
{
  while (*l)
    continue;
  *l = 1;
}

A slightly better approach:

#       %eax = 1
incl    (%eax) # <--- Avoids races on one CPU
# Internally, this is a load + add + store, so 2 CPUs wouldn't help

# alternative:
lock incl   (%eax)  # This avoids the race. "Lock" means to execute
                    # instruction atomically

void lock_incl (int *l)
{
  asm("lock incl %...."); // write the assembly in C code
}
// Still not sufficient for a locking primitive

A better solution is a new primitive: xchgl

This is called "exchange long"
It swaps a register and memory atomically

// returns old value
int xchgl (int *l, int value) // takes in new value
{
  asm("....");
  return // ...
}

void lock(lock_t *l)
{
  while (xchl(&l, 1) == 1)
    continue;
  // And this will work!
}

void unlock(lock_t *l)
{
  *l = 0; // Safe, because no two people can run lock at the same time
}

Lock incl generalized

We want to be able to a new type of lock. Instead of just incrementing a long, we want to be able to lock other operations. We want to lock any mathematical function (i.e. multiply by 3, divide by 2, etc.)

"Apply atomically" approach

// "aa" == "apply atomically"
void aa(int *p, int *l, int (*f) (int))
{
  lock(l); // l is a lock location passed in by user
  *p = f(*p); // Buggy, two threads could store into *p at
              // the same time
  unlock(l);

  // But can we do something more efficent?
}

void aa(int *p, int *l, int (*f) (int))
{
  do  {
    int v = *p;
    int r = f(v);
  } while (xchg (p, r) != v)
  /* Bug: if we have the wrong r value, then our lock is messed up */
}

Another instruction: compare and swap

bool cmpxchgl (int *p, int old, int new)
{
  // address of location, old value, and new value to set it to

  // Won't change memory if old value is wrong
  if (*p == old)
  {
    *p = new;
    return true;
  }
  else
    return false;
}

// This version of aa is sufficient
void aa(int *p, int (*f) (int) )
{
  do {
    int old = *p;
    int new = f(old);
  } while (!cmpxchgl (p, old, new) );
  // cmpxchgl is a bottleneck on multicore systems (1000 cores)
}

The latest solution: Hardware lock elision (HLE)

This was developed last year by Intel

  movl    $1, %eax
  # l is memory location
  # "xaquire" is new
try:
  xacquire lock xchl    %eax, l
  cmp   $0, %eax
  je    try
  # We have the lock
  xrelease movl, $0, l

How does xacquire work?

It doesn't send a signal on the bus, telling processors to avoid a certain memory location
Instead, the CPU will be keeping track of all the loads & stores. Then, it will check to see if other processors collide with ours.
If we lost a collision, then xrelease magically jumps back to the "try" section to try again

Pipes and locking

struct pipe {
  size_t r, w;
  char buf[BUFSIZE];
  lock_t l;
};

void write (struct pipe *p, char c)
{
  for (;;)
  {
    lock(&p->l);
    if (p->w - p->r != BUFSIZE)
      break;
    unlock(&p->l);
    yield();
  }
  p->buf[p->w++%BUFSIZE] = c;
  unlock(&p->l);
}

We want something more efficient. We want something like, "yield until this resource is available". Such a resource is called a blocking mutex

Blocking Mutex

If we don't get the lock, put ourselves to sleep until it is available

struct process {
  // other stuff
  struct process *next;
};
// typedef lock_t bmutex_t; // Not good for waiting
// Instead
typedef struct {
  lock_t l;
  struct process *p; // linked list of processes waiting on lock
  struct process **last; // to append to end quickly
  bool acquired;
} bmutex_t;

// Acquire a lock or put self to sleep
void acquire (bmutex_t *b) {
  while (1)
  {
    lock (&b->l);
    if (b->acquired) {
      b->p = cp
      cp->blocked = true;
      yield();
      continue;
    }
    break;
  }
  b->acquired = true;
  unlock (&b->l);
}

    // Undo an acquire
    void release (bmutex_t *b) {
      lock(&b->l);
      b->acquired = false;
      b->p = p->next;
      p->blocked = false;
      unlock(&b->l);
    }

Deadlock (deadly embrace)

Our previous approach is susceptible to this.

If we carefully protect everything with locks, and eliminate all other races, we can still have deadlock! This is even harder to test for.

Four conditions are necessary for deadlock:

All four of these need to be met on a system for deadlock to occur. If even one is broken, then deadlock cannot occur.

Circular wait
- p waits for q, q waits for r, etc. until some process waits for p
Mutual exclusion
- If I have a lock on a resource, no one else can access that resource
No preemption of locks
- No way to break a lock
Hold and wait
- A thread can hold one lock while it waits for another lock

How to solve the deadlock problem: attack at least one of the 4 conditions above:

Solutions (number denotes which condition we attack)

Several ways:
1. Ordering on lock (&l < &m)
  - If you need to grab more than one lock, grab them in ascending order.
  - Wait on the first lock. If someone has any other lock, release all the locks and start over
  - This depends on the user
2. cycle detection
  - bipartite graph (resources and processes)
  - process ---> resource if it owns a resource
  - resource ---> process if process wants to acquire resource
  - if there is a cycle, you get EDEADLOCK
  - This is Linux's approach
  - This is implemented in the OS, not by the user
3. Compare & contrast a and b:
  - 1. places burden on the developer
  - 1. might let deadlock occur and just error out
  - In practice, b's chains aren't very long
Prohibit mutual exclusion. Use aa() function
Allow preemption (not a disaster in general, good for some purposes)
Make the rule that you cannot acquire a lock if you already have one

Method 1 is the most important. This is the approach we'll take.

Example


    parent:
      pipe()
      pipe()
      fork()
        exec sort # sort won't deadlock because sort won't output until it reads
                  # EVERYTHING. Calling gzip, on the other hand, would hang.
                  # It's only because of this specific program's implementation
                  # that this is safe.
      write
      write
      close
      read
      read
      read

Priority inversion

Mars Pathfinder

This was a space craft sent to Mars in 1997
A bug in the scheduler caused the space craft to be lost
A high priority task waited on a low priority task that was never allowed to finish
Eventually, a watchdog killed the high priority task so that it never was run
The killed task happened to be the process for communicating with Earth. So when this task was killed, the lander forgot where Earth was and we we lost contact with the space craft

T_low	T_med	T_high
runnable	waiting	waiting
acquire (&m)	------------------------>	(runnable)
acquire (&m)	(runnable)	<------ acquire (&m)
	computation

Arrows denote the flow of execution between threads. The high priority thread tries to run, but the low priority thread is holding a lock it needs. The medium priority thread, however, does not need this lock. when this thread becomes runnable, it runs instead of the low priority thread.

Notice that the high priority task doesn't get run immediately, and this could result in the job being killed by a watchdog if the medium priority thread takes too long (i.e. a long computation, as indicated).

How can we solve this? A high priority process should lend its priority to the holder of the lock it's waiting on. That way the high priority process will not get preempted and this issue will not occur.

Midterm Info

110 minutes
open book, open notes
bring your own paper