Synchronization

Synchronization: critical sections and mutexes

Implementing pipes so they work

First attempt:

#include <stdef.h>

enum {N = 8*1024;} //8 kilobyte pipes

struct pipe{

size_t r,w; //read, write cursors,

//r is offset 0, w is offset 4

char buf[100]; //starts at offset 8

//assume read, write 1 byte at a time

}

void writec(struct pipe *p, char c){

p->buf[p->w++ %N] = c;

}

char readc(struct pipe *p){

return p->buf[p->r++ %N];

}

Things that can go wrong

1) Read end can pass write end

2) Write end could wrap, pass read end

3) integer overflow in ++

4) multiple processes accessing thread simultaneously.

Overflow is solved by having size_t unsigned, so it wraps around to 0 with overflow, i.e. overflows “nicely” if N is power of 2.

First rewrite:

void writec(struct pipe * p, char c) {

while (p->w – p->r == N) //pipe is full

continue; //you wait

p->buf[p->w++ % N] = c;

}

char readc(struct pipe *p){

while (p->w – p->r == 0) //wait while the pipe is empty

continue;

return p->buf[p->r++ % N];

//still doesn’t handle overflow

}

Second rewrite:

void writec(struct pipe * p, char c) {

while (p->w – p->r == N) //pipe is full

continue; //you wait

p->buf[p->w++ % N] = c;

if(p->w == 2*N){

p->w -= N;

p->r -= N;

}

Except for efficiency, problems 1) and 2) are handled.

Making things work with multiple processes accessing pipe simultaneously.

-the situation with no readers works

Problem 1: incrementation is OK if a single instruction, but typically it requires several instructions, which might be interleaved with different threads.

Possible actual machine code:

load 4(r1), r2 %get p->w from memory

add $1, r2, r3 %increment

store r3, 4(r1) %store new value of p->w

and $8191, r3, r3 %take %N

store r0, 4(r1, r3) %store character

The way this might work:

Thread 1 Thread2

r2 = 100

r3 = 101 → r2 = 100

p-> = 101 ← r3 = 101

→ p->w = 101

No effect ←

→ No effect

c1 = p->buf[101] ← c2 = p->buf[101]

One of the characters is lost.

The core of the problem: the line:

p->buf[p->w++ % N] = c;

contains multiple assignments, whose order is undefined.

Rewrite 2:

void writec(struct pipe *p, char c) {

while(p->w – p->r == N)

continue;

size_t w = p->w;

size_t new_w = w + 1;

p->buf[w % N] = c; //�tell the compiler “don’t optimize // this!” gcc = o0

p->w = new_w;

// overflow check code goes here!

}

Make analogous changes to readc.

If we make assumptions that:

1)loads and stores are carefully ordered

2)loads and stores are atomic (READ-WRITE COHERENCE)

3)it’s OK to spend a busy-wait

Then the situation with 1 reader and 1 writer works.

Attacking the problem of multiple readers and writers.

A critical section:

-a series of instructions that at most 1 processor should be executing at any given time.

We need to enforce this somehow.

This is a problem if

1) single processor, but preemptive multitasking

2) multiple processors, 1 thread

If neither applies, there’s no problem (assuming non-preemptive multitasking).

2 subproblems:

A)MUTUAL EXCLUSION

B)BOUNDED WAIT

if a thread wants in, it should get in quickly (<= 5 sec)

avoids starvation

Solution: make whole readc and writec a critical section.

Will prevent incorrect data (but it can loop forever if there’s nothing in the pipe)

But... you don’t want critical sections to be too large

- other threads can’t do useful work

- you might even starve them.

You don’t want critical secion to be too small

- you get races

MINIMAL CRITICAL SECTIONS:

- avoid races

- if you make them smaller, you don’t avoid races

Make the following a critical section:

size_t w = p->w;

size_t new_w = w + 1;

p->buf[w % N] = c;

p->w = new_w;

There’s still a problem:

while(p->w – p->r == N)

continue;

Another thread may intervene, so there’s no more room to write, but the function already made the check, so it writes!

Rewrite 3:

void writec(struct pipe* p, char c) {

for (ii) {

disable_interrupts(); //critical section starts

Critical section: (must be fast)

if (p->w – p-> r != N) {

p->buf[p->w++ % N] = c;

enable_interrupts(); //critical section ends

return;

}

enable_interrupts();

}

This works for a single CPU case, with preemptive multitasking.

For multiple CPUs, use a MUTEX:

typedef ? mutex_t;

void lock (mutex_t *); //grab control of mutex, twiddling thumbs //while waiting

void unlock( mutex_t *);

Implementing locks:

typedef int mutex_t;

void unlock(mutex_t* m) { *m = 0;}

void lock (mutex_t * m) {

while (*m)

continue;

*m = 1;

}

However, this can get interrupted!

X86 processor has instructions:

xchg %ebx(%eax) atomic (but slow)

the test_and_st instruction. It functions as follows:

void test_and_set(int *m, int n){

int 0 = *m;

*m = n;

return 0;

}

Re-implement lock:

void lock (mutex_t * m) {

while (test_and_set(m,1) == 1)

continue;

*m = 1;

}

Replace disable_interrupts() in readc and writec with

lock(&m);

and enable_interrupts() with

unlock(&m);

This locks at the wrong level (a single, global lock).

COARSE-GRAINED LOCK

Single lock that governs many resources;

Simpler, easier to program

FINER.GRAINED.LOCKS

govern few resources

+ better utilization

Add to pipe structure a new field:

mutex_t m;

in readc, writec:

lock(&p->m);

unlock(&p->m);

For even finer-grained locks, we can have separate locks for reading and writing.

Add to pipe structure a new field:

mutex_t rm, wm;

in readc, writec:

lock(&p->wm); //in write

lock(&p->rm); //in read