Lecture 9: Synchronization Primitives and Deadlock

Critical Sections

We first try to create a critical section by adding a global variable that we will check for every time we want to execute the code. We shall call this variable, watchout. When the variable is set to one, we can execute the critical section. We then set it to zero to signify the end of the critical section.


		1  int watchout; // global
		2  watchout = 1;
		3  acct1 = amt;
		4  acct2 -= amt;
		5  watchout = 0;

But this implementation isn't good because threads can interfere with each other. Instead we'll try the following code in which we keep checking the variable until it is set to zero by another thread. Then we can run the critical section code.


		1  int watchout;
		2  while (watchout)
		3  	continue;
		4  watchout = 1;
		5  /* CS code */
		6  watchout = 0;

But this is still bad because of race conditions. If two threads both see watchout at 0, both will go into critical section. Thus, we need help from the hardware.

Hardware representation

We can snoop on the bus. Whenever something is stored, it is immediately cached. The other caches snoop the bus and keep track of the new cache.

x86

For machines with multiple CPU's, we need a method of locking in which one thread writes to shared variables at a time. The following assembly function adds one to the memory of the variable so that other threads cannot access the variable. However, it doesn't let any other CPU perform the same thing.

		  asm (lock incl x) 		//add one to memory

		  void lock_incl (int *p)	// does the machine instruction

Advantages and disadvantages:
+ atomic increment memory (don't have to worry about other threads)
- slow (can't use cache; direct communications to CPU)

Other functions similar to this:

1) int test_and_set(int *p, int val);
2) int exchange(int *p, int newval);
3) dozens more functions...


		1  int exchange (int *p, int newVal) {		//asm (lock xchgl)
		2  	int oldVal = *p;
		3  	*p = newVal;
		4  	return oldVal;
		5  }

The exchange function replaces the old value with the new value and returns the old value. This implementation is not protected from race conditions when running multiple threads. As such, we need to introduce a locking system.


		1  typedef int volatile lock_t;
		2  void lock (lock_t *p) {
		3	while (exchange(p,1))		//returns 1 if already locked, 0 unlocked
		4		continue;		//exchange only works if p isn't already 1
		5  }
		6
		7  void unlock (lock_t *p) {
		8  *p = 0;
		9  }

Precondition for lock: You do not own the lock
Precondition for unlock: You own the lock

Two basic properties for critical code:

mutual exclusion
bounded wait:
if waiting, you know there's an upper bound for wait time - this is key because critical sections need to be short

One CPU can cause n-1 CPU's to busy wait = spinning

Can we avoid these locks?
Yes, have critical sections be so simple that it's atomic already

EXAMPLE:

		int x; 			x = 192736
		{ int i = x; } 		{ x = 27; }
		Thread A		Thread B

We know that the variable i must be either 27 or 192736 depending on which thread runs first.

HOWEVER...
Suppose, we have a large data type.

		 long long x;

The previous example doesn't work when variable is too big. It becomes truncated
Therefore, your objects can't be too large

Suppose, we create a struct that is packed into one byte.


		1  struct {
		2  unsigned inb a : 7;		// declaring bit fields
		3  unsigned inb b : 1;
		4  } x;

		thread A			thread B
		{int x.b = 1;}			x.b = 1;

Now we have multiple threads reading/writing from same bit field
Therefore, your objects can't be too small

Suppose, we have an array.


		1  int a[2];
		2  char *p = (char*) a;
		3  p += sizeof (int)/2;
		4  int *ip = (int*)p;
		5  waitpid (27, p, 0);

This initializes an array of two integers, which takes up eight bytes, four bytes per integer. The code then changes the pointer to point to the middle of one of the integers. Now, if a thread tries to read or write from this pointer, it will read/write half of each integer.

Therefore, your objects cannot be misaligned.

Locks & Pipes

First, we declare a pipe structure, as well as its primitives.

	    
		1  #define N 1024
		2  struct pipe {
		3  	char buf [N];
		4  	size_t r, w;
		5  };
		6
		7  char read_pipe (struct pipe *p) {
		8  	while (p->r == p->w)		//if pipe is empty
		9	{ continue; }
		10	return p->buf[p->r++%N];
		11  }
		12
		13  char write_pipe (struct pipe *p, char c) {
		14	while (p->w - p->r ==N)		//if pipe is full
		15	{ continue; }
		16	p->buf[p->r++%N] = c;
		17	}

However, this does not account for race conditions...

Solution: declare lock_t l

	    
		1  #define N 1024
		2  struct pipe {
		3  	char buf [N];
		4  	size_t r, w;
		5  } lock_t l;
		6
		7  char read_pipe (struct pipe *p) {
		8  	while (p->r == p->w)
		9	{ unlock (&l); lock (&l); }
		10 	lock(&l);
		11	char c = p->buf[p->r++%N];
		12	unlock(&l);
		13	return c;
		14  }
		15
		16  char write_pipe (struct pipe *p, char c) {
		17	while (p->w - p->r ==N)
		18	{ unlock (&l); lock (&l); }
		19	lock(&l);
		20	p->buf[p->r++%N] = c;
		21	unlock(&l);
		22  }

Performance concern:

We read in one char at a time => We want to make a struct that holds multipe chars and has a bigger buffer
If we lock the pipe and wait for it to be filled, it will stay empty because we have the lock => change critical section to not contain loop. Have bounded waits
Many locks/unlocks in loop => unlock(&l); yield(); lock(&l);, busy wait problem

However, this method does not allow parallelism => bottle neck
Coarse grained lock - locking l everytime does not allow two thread to write into two different pipes, it's too conservative

Solution: add lock_t to pipe struct

lock(&p->l); unlock(&p->l);

+ Finer grained lock
- More locks increases complexity (what order to grab?)
We need something better than yield(), to make sure we don't awaken unless we have the resources.

Solution: blocking mutex
The blocking mutex has a data type that contains the other threads who want the lock and who has it.


		1  typedef struct {
		2  lock_t l;		// spin lock, loop or spin until you get control of the mutex
		3  int locked;
		4  bool locked;
		5  thread_t *blocked_list;
		6  } bmutex_t;
		7 
		8  void acquire(bmutex_t *b) {
		9	lock(&b->l);
		10	while (lock(&b->l, b->locked) {
		11		thread_t *l = b->blocked_list;
		12		b->blocked_list = self;
		13		self->runnable = 0;
		14		self->next = l;
		15		unlock (&b->l);
		16		yield();
		17	}
		18	b->locked = true;
		19	unlock(&b->l);
		20  }
		21
		22  void release (bmutex_t *b) {
		23	lock(&b->l);
		24	b->locked = false;
		25	thread_t *next = b->blocked_list;
		26	if (next) {
		27		next->runnable = 1;
		28		b->blocked_list = self->next->next;
		29	}
		30	unlock(&b->l);
		31  }

Semaphores

Semaphores: a blocking mutex controlled by int, not bool (with arity > 1). Arity represents the limit on how many processes can access the lock.

How it's implemented:


		1  typedef struct {
		2  	lock_t l;		// spin lock, loop or spin until you get control of the mutex
		3	int locked;
		4	int arity;
		5	thread_t *blocked_list;
		6  } bmutex_t;

"Binary semaphore" = blocking mutex (int equal 0 or 1 vs. bool)
We can use build higher level blocks, which will affect the efficiency. This method still causes some spinning. If we had a scenario where we had 499 readers, 1 writer and 1 pipe, the CPU must wake up all processes trying to write and read from it. This chews up CPU time since all blocked processes must wake up.

Analogy between blocking mutex and semaphore:
unlock == release; lock == acquire

Condition Variables

Condition variables represent the condition the application is interested in. They are considered boolean expressions that can be used to notify other processes when the condition is false.

EXAMPLE: "is pipe full?", "is pipe empty?"

How it's implemented:
We use a condition variable, condvar_t, plus a blocking mutex, where condvar_t controls access. We also implement the primitives to wait on a process and to notify other processes when the condition is validated.

		wait_for(&c, &b)
		notify (&c) 	//once condition becomes true for some other pipe

Using the previous code with the pipe struct, we modify it to include this new method of locking:


		1  #define N 1024
		2  struct pipe {
		3 	char buf [N];
		4	size_t r, w;
		5	condvar_t nonempty;
		6	condvar_t nonfull;
		7  } lock_t l;
		8
		9  char read_pipe (struct pipe *p) {
		10	while (p->r == p->w)
		11	{ unlock (&l); lock (&l); }
		12	lock(&l);
		13	char c = p->buf[p->r++%N];
		14	unlock(&l);
		15	notify(&c->nonfull);
		16	return c;
		17  }
		18
		19  char write_pipe (struct pipe *p, char c) {
		20	while (p->w - p->r ==N)
		21	{ unlock (&l); lock (&l); }
		22	lock(&l);
		23	p->buf[p->r++%N] = c;
		24	notify(&p->nonempty);
		25	unlock(&l);
		26  }

Synchronization Problems

race conditions
performance
deadlock (two processes waiting for each other)
EXAMPLE: tr a-z A-Z | cat | sort

Next lecture, we will discuss how to resolve deadlock issues...

Lecture 9: Synchronization Primitives & Deadlock

by Sung Joon Baek, Jessica Pham, Joshua St Clair, and Christina Yang

Critical Sections

Locks & Pipes

Semaphores

Condition Variables

Synchronization Problems