CS 111 Spring 2009

Synchronization - Implementing Pipes

April 27, 2009

by Vyvy Dao and Michael Nichols

A First Implementation

#define N 512

struct pipe {

   char buf[N];

   size_t r,w;

}

void write(struct pipe *p, char c) {

   while (p->w - p->r == N) continue;

   p->buf[p->w++ % N] = c;

}

char readC(struct pipe *p) {

   while (p->w - p->r == 0) continue;

   return p->buf[p->r++ % N];

}

This code will have terrible race conditions. How to fix it?

Solution:
- 1 processor
- Make these functions syscalls
- Disable interrupts in functions


Critical Section

A critical section is a set of instructions such that indivisibility is preserved,

if at most one instruction pointer is in the section at any given time.


Implementing a critical section correctly has two subproblems:
- Mutual Exclusion: necessary for correctness
- Bounded Wait: necessary for fairness; at most, some small amount of time with CPU

How to implement mutual exclusion with multi-cores?
- with mutexes / spin locks

The definition for a mutex looks something like this:

typdef ______ mutex_t {
   void lock(mutex_t *m);
   // Precondition: not locked otherwise could loop forever
   void unlock(mutex_t *m);
   // Precondition: locked mutex
}
typedef int mutex_t;
void lock(mutex_t *m) {
   // while locked -> busy-wait
   while (*m) continue;
   *m = 1;
}

void unlock(mutex_t *m) {
   *m = 0;
}

We have a problem: we still have race conditions.
There are complicated and slow algorithms that fix this if you only assume
that loads and store are atomic. For obvious reasons, they're not used much
in practice. Take a look at the lock function, what happens when both people
grab the locks?

In practice, this problem has been solved in hardware by adding a special
machine instruction to facilitate an atomic load and store operation:

This instruction is known as "Test-and-Set" - (x86):
xchg %eax,(%ebx)
This code will switch the contents of registers A and B atomically.

Here's our new locking code:

void lock(mutex_t *m) {
   while(testAndSet(m, 1) continue;
}
int testAndSet(int *p, int new) {
   int old = *p;
   *p = new;
   return old;
}

This testAndSet function is privileged, so it's not necessary to make it
a syscall and can be used in user code safely.

As it stands we have one mutex for the entire operating system

Problem: What if we are a server with 1000 users who all want to get locks at once?
Solution: Give each pipe its own mutex.

Problem: Now we can read each pipe separately, but we can't write to it at the same time.
Solution: Separate mutexes for read and write.

Let's change our previous definition for pipes to fix these problems and also
refactor our methods a bit.

struct pipe {
   char buf[N];
   size_t r, w;
   mutex_t rm, wm;
}
void write(struct pipe *p, char c) {
   for(;;) {
     lock(&p->m);
     if (p->w - p->r != N) break;
     unlock(&p->m);
   }
   p->buf[p->w++ % N];
}
char readC(struct pipe *p) {
   unlock(&p->m)
   while (p->w - p->r == 0) continue;
   return p->buf[p->r++ % N];

You may be asking yourself: Will having separate read and write mutexes
really improve performance?
Suppose you have a pipe with 1000 readers and 1000 writers. There will still
be a bottleneck.

Granularity of Locking Mechanisms

COARSE_GRAINED VS. FINE_GRAINED LOCKING

Coarse_grained: simpler, more likely to be correct, but bottlenecks are worse
Fine_grained: system will scale better with larger volumes and a greater number of CPUs

Linux locking has gotten more and more fine over the years
Problem: How fine to make the grain?
Solution: Find the minimal critical section for an operation

Approach:
1. Look for writes to shared objects
2. Expand to look for dependent reads (reads used to decide where/what to write)
3. Package these up and protect them with locks

Problem: Our lock() function is slow when there's a high probability that the
object is locked. This is especially bad because this is precisely the situation
in which we don't want to waste CPU time.
Solution: Use a blocking mutex that gives up the CPU when it can't get a lock

Blocking Mutex:

// give up CPU when can't get a lock
struct mutex {
   mutex_t m;
   bool locked; // int in semaphores
   // linked list of threads waiting to grab this lock
   proc_t *blocked_list;
}

Let's also rewrite our read/write functions:
void acquire(struct bmutex *b) {
   for(;;) {
      lock(&b->m);
      //set self to locked
      self->state = BLOCKED;
      b->blocked_list.add(self)
      if (b->locked)
         break;
      unlock(&b->m);
      // let someone else run
      schedule();
   }
   b->locked = true;
   self->state = RUNNABLE;
   b->blocked_list.remove(self);
   unlock(&b->m);
}
void release(struct bmutex *b) {
   lock(&b->m); // acquire();
   b->locked = false;
   int i;
   for (i = 0; i < NUM_BLOCKED_PROCS; i++) {
      b->blocked_list[i].state = RUNNABLE;
   }
   unlock(&b->m); // release();
}

Semaphores - like mutexes but the "bool locked" is an "int locked". This allows
limited access to multiple processes. You can make a semaphore with value 10,
which allows 10 processes to access data.

Example: You have 10 disk drives and each process that requests data can get at
most 1 disk drive.

Here are the read and write functions for semaphores:
"acquire"
p(s): prolaag - means "try and decrease"
v(s): verhoog - means "increase"

It's important to note at this point that a blocking mutex is exactly the same
as a binary semaphore.

Problem: With 1 writer and 1000 readers, when the writer finishes, 1000 readers
will be woken up and 1 will grab the lock and 999 will go back to sleep --
after wasting valuable CPU time.

Condition Variables

- Even finer-grained than blocked mutexes
- There is a blocked mutex protecting this expression
- You have a condition variable representing this condition
- Don't want to wake up a process until a well-defined condition becomes true

Condition Variable API:

wait(condvar_t *c, struct bmutex_t *b) {
   /* pseudo code:
   * Caller must acquire b first
   * Releases b, then blocks until condition becomes true
   * Reacquires b
   */
}
notify(condvar_t *c) {
   // notify a process that the condition might become true
   // processes decide who runs next
}
notify_all(condvar_t *c) {
   // notify all other processes that the condition is true
}

Here is our updated definition for pipes using semaphores:
struct pipe {
   char buf[N];
   size_t r, w;
   struct bmutex_t b;
   condvar_t nonfull;
   condvar_t nonempty;
}

Here is what the new write function looks like:
void write(struct pipe *p, char c) {
   acquire(&p->b);
   // verify notification here
   while(p->w - p->r != N)
      wait(&p->nonfull, &p->b);
   p->buf[p->w++ % N] = c;
   notify(&p->nonempty);
   release(&p->b);
}

Intro to Deadlocks

Motivation: We want to be able to read and write more than one character at a
time in order to make our OS more efficient.
One way to do this would be to grab locks on both the read and write portions
of a pipe at the same time to streamline the process.
EX: ... | cat | ...

Replace
   for(;;) {
      c = readC(in);
      write(out, c);
   }
with
   transfer(in,out);

void transfer(struct pipe *in, struct pipe *out) {
   acquire(&in->bm);
   acquire(&out->bm);
   if ((in->w != in->r) && (out->w - out->r != N))
      out->buf[out->w++ % N] = in->buf[in->r++ % N];
   release(&out->bm);
   release(&in->bm);
}

There is a very serious problem with this code. Consider this example:

Thread 1 calls transfer(a,b)
Thread 2 calls transfer(b,a)

What if thread 1 locks a and thread 2 locks b?
They will both be stuck continuously waiting for the lock that the other one
already has a lock on.

This is known as deadlock.