CS 111
Lecture 9 Scribe Notes
Synchorinzation Primitives; Deadlock

Notes By Muhammad Shabbir

Synchronization Mechanisms

Simple Case: One CPU, critical section

//Start Critical Section
  balance+=amt;
//End Critical Section

Given cooperative multithreading, we simply choose whether to enable interrupt or disable interrupt. On a singe CPU / small embedded system, this works well. This was the original UNIX.

BUT we often have other, more likely cases:

More Realistic Case: We can have Multiple CPU's, perhaps even Preemeptive Multithreading.

1 typedef ______ spinlock_t;
2 
3 void lock(spinlock_t *l)
4 void unlock(spinlock_t *l);

Read / Write Coherence

--Won't work for large objects: you will read or write one piece at a time. For example:

1. typedef char[1024] spinlock;
2. strcpy(l, "locked")
3. // Does not work atomically

--Won't work for really small objects either

1. unsigned int a=1;
2. unsigned int b=7;
3. spinlock t;
4.
5. *spinlock_t l; 
6. l+a=1;

The program loads surrounding word. Sets bit. Then stores surrounding word. So if an object is smaller than the "word" size, it will not be atomic, because the program will load not only that bit but the surrounding bits which may belong to other objects and data. However if we can ensure that our data is the size of a word, there is an instruction that may be of use to us:

1. movl %eax, x   #This will replace x all at once.

Unfortunately, this is still not a complete solution. This is atomic on x86 only if x is aligned. This brings us to our third case:

--Misaligned objects can also cause problems: For example say we have a four byte object. In memory it starts from location 6 and goes to location 10. Even though it is the size of a word, to overwrite or read this object, the machine may access memory addresses from 4 to 8 and then from 8 to 12. Thus, it will not be an atomic instruction.

xchg Example

We need more help from the hardware guys. There is an an instruction, xchg, that will swap data atomically. It takes two registers as arguments and swaps their contents. The downside? It is slow. It sends out a msg on the bus to tell others not to use the memory or RAM at at that location. It performs the swap. Then it sends an all clear message on the bus when it is done.

1. xchg %eax(%ebx)  #Swaps atomically, but is slow.

What is the C equivalent of xchg?

1. int xchg(int new, int *p){
2. 	int old = p;
3. 	*p = new;
4. 	return old;
5. }
6.
7.	//We can use xchg for our spinlock.
8.
9. while(xchg(1,l)){ continue;}

Adding Atomically

Adding 1 atomically example: lock incl counter

1. int counter;
2. spinlock_t l;
3.
4.
5. lock(&l);
6. counter++;
7. unlock(&l);
8.

Adding 4 atomically example: Two different instructions.

test-and-set

compare-and-swap. To understand this properly, we need to write c equivalent

1. bool compare-and-swap(int *p, int old, int new){
2.	if (*p == old){
3.		*p=new; return 1;
4.	}
5.	else{ 
6.		return 0;
7.	}
8. }
9.
10. for(;;){
11.	int c = counter;
12. 	if (compare-and-swap(&curr, c, c+4)) break;
13. }
14.		//You can actually replace c+4 with any function.  (side effect free)

Blocking Mutex Implementation

1. typedef struct{
2.	splinlock_t l;
3.	proc_t *blocked_list;
4.	bool acquired;
5. } bmutex_t;
6. 
7. typedef struct{
8. 	proc_t *next_blocked;
9. } proc_t;
10. 
11. void acquire(bmutex_t *b){
12. 	for(;;) {
13. 		lock(&b->l);
14.		current_process->blocked;	//current_process is a global variable that points to the current process
15.		current_process->next_blocked = p->blocked_list;
16.		p->blocked_list = current_process;
17.		if(!b->acquired) break;
18.		unlock(&b->l);
19.		schedule();
20.	}
21.	b->acquired = 1;
22.
23.
24.	
25. void release(bmutex_b *b){
26.	lock(&b->l);
27.	for(proc=b->blocked_list;*proc; proc=proc->next_blocked){
28.		proc->blocked=0;
29.	}
30.	b->acquired=0;
31.	unlock(&b->l);
32.  }

Semaphores

A semaphore is a blocking mutex controlled by int, not bool acquired. So it is locked when int == 0, and unlocked when int > 0. It is used when you have N instances of a resource and so you want at most N processes to run simultaneously. So a blocking mutex is in reality a binary semaphore where N = 1.

Implementation Example: Use a semaphore to control access to pipes

1. struct pipe{
2. 	char buf[N];
3.	unsigned r,w;
4.	bmutex_t b;
5. }
6. 
7. char readc(struct pipe *p){
8.	acquire(&p->b);
9.	//this is where we would have our read c code
10.	release(&p->b);
11.	return c;
12. }

However, we still have a problem! Imagine the scenario where we have a pipe and a 1000 processes trying to either write or read from it. As soon as anything is free, all readers wake up, 1 wins, 999 lose. This chews up CPU time as ALL readers wake up. We must implement a condition variable.

Condition Variable

It can stand for any boolean condition you like. In our example: pipe is not empty. Checking p->r!=p->w.

We have implemented a
--blocking mutex that protects the condition
--condition variable that represents the condition
API below:

1.  wait(condvar_t *c, bmutex_t *b)
2. 	//precondition: b is acquired
3.	//does: - releases b
4.	//	- then asks until some other process notifes that condition may be true
5.	//	- reacquires b
6. 
7.  notify(condvar_t *c)
8.	//wakes up some thread waiting for condition to be true
9. 
10. notifyALL(. . . )
11. broadcast(. . . )
12. 	//The above lets us broadcast or contact all threads
13.

Now, we finish our earlier implementation of read c with our condition variable implemented.

1. char readc(struct pipe *p){
2.	acquire(&p->b);
3.	while(p->r == p->w)
4.		wait(&p->nonempty, &p->b);
5.	char c = p->buf[p->r++%N];
6.	notify (&p->nonfull);
7.	release(&p->b);
8.	return c;
9. }
10.

Issues with Implementing Locks in Code

Default: Dangerous: threads = unsynchronized computation! To resolve this, we add spinlocks, mutexs, etc to code. Two issues arise
1. miss a race.
2. Oversynchronization:
   --make code slow -> introduce bottlenecks
   --deadlock

Deadlock Example:
a | cat | b

1. while((n=read(0,buf,sizeof buf))>0)
2.	write(1,buf,n);
3.	//2 copies of data
4.	while(xfer(0,1,n)>0)
5.		continue;
6.	//1 copy of data
7.	//grab a block of input, grab a block of output, copy data, reasses both locks;
8.	return c;
9. }
10.

process 1, or pipe 1 will be waiting on process 2 and pipe 2. However process 2 will be waiting for process 1. Thus there is a a cylce of dependency. This situation is known as deadlock.

Ways to Avoid Deadlock:

1. Hold at most 1 lock or merge your locks.
2. Grab locks away from other people
3. OS detect dependency loops, so acquire(. . . ) can fail.
4. Lock ordering (always grab locks in order).