When a computer runs multiple threads, the synchronization of is critical. When multiple threads try to access and modify memory at the same time, you must in some way ensure that the threads do not run into each other. As was described in the previous lecture, one way to control the synchronization of threads is the use of locks. One type of lock that is commonly used is a spin lock. When using a spin lock, if a thread wants to acquire a lock, it will loop (spin) until the lock is available. In other words, a spin lock keeps the CPU busy until it receives the lock. Here is an initial attempt at the implementation of a spin lock:
typedef int mutex_t;
void lock(mutext_t *m) {
while(*m) continue;
*m = 1;
}
However, this above implementation is not quite right. Between the time that the value of the mutex is checked and the time that it is assigned a new value, the value in memory could have changed. This section needs to be done atomically, which is accomplished using the atomic x86 xchg instruction:
typedef int mutex_t
void lock( mutex_t *m ) {
while( xchg(m, 1) == 1 ) continue;
}
void unlock( mutex_t *m) {
*m = 0;
}
Preconditions: It is important to remember the preconditions for calling lock and unlock. Lock may only be called when you do not own the lock. Unlock can only be called when you own the lock.
Are there other atomic x86 instructions we could use to implement a spin lock? Actually, there are several x86 instructions that are atomic:
lock incl x
incl
is not atomic, but you can make an increment long instruction atomic by placing lock in front of it.lock incl
is atomic, but it is very slow.xchg
xchg
works and is more general in its use than lock incl
Using compare and swap we can compute any function atomically
//compare and swap implementation in C
bool cas( int *p, int old, int new ) {
if( *p == old ) {
*p = new;
return true;
} else {
return false;
}
}
Here is an example of how we could compute a factorial atomically:
void factorial_transaction (int *p) {
do {
int old = *p;
int new = fact(old);
} while( !cas(*p, old, new) );
}
Instead of the factorial function in the above example, we can compute any function there! Compare and swap allows us to implement any function atomically.
So does compare and swap solve all of our problems? Look at the following example of a bank account that is shared by Dr. Eggert and his brother where they both share an account that currently has $100. What happens if at the same time, Paul tries to withdraw $80 and Kurt tries to withdraw $40?
How can we solve the problem of both people withdrawing money from the account at the same time?
struct account {
long long balance;
...
volatile mutex_t lock;
}
Withdraw money:
lock( &p -> lock );
p->balance -= 8000;
unlock( &p -> lock );
Now, let's try to apply what we learned from giving the bank account its own lock to an implementation of pipes in linux:
struct pipe {
char buf [N];
size_t r, w;
mutex_t m;
}
There are two possible ways to use locks with a pipe - either each pipe has its own lock or there is a global variable lock for all of the pipes:
mutex_t
)
void writec( struct pipe *p, char c) {
lock( &m );
...
unlock(&m);
}
mutex_t
lock)
void writec( struct pipe *p, char c ) {
lock( &p -> m );
...
unlock(&p -> m);
}
Is it possible to make a finer grained locking system? Right now each pipe has its own lock. However, a reader should not exclude others from being able to read as well. We can create greater parallelism and create a finer grained locking system by changing our single lock for the pipe to have two locks, one for reading and one for writing.
struct pipe {
char buf[N];
size_t r, w;
mutex_t rm, wm;
}
Now when we write, we get a write lock and when we read, we get a read lock.
bool writec( struct pipe *p, char c ) {
lock( &p->wm );
if( p->w - p->r == N ) {
unlock( p->wm);
return false;
}
p->buf[p->w++ % N] = c;
unlock( &p->wm );
return true;
}
If you want writec
to be a void
function instead, you can change it to:
void writec( struct pipe *p, char c ) {
lock( &p->wm );
if( p->w - p->r == N ) {
continue;
}
p->buf[p->w++ % N] = c;
unlock( &p->wm );
}
This function will spin until a reader helps us out and releases the lock, allowing us to claim the write lock until we finish writing.
We have now discussed the design decision of where to place locks within a data structure. However, equally important is the decision of where to call the lock()
and unlock()
functions within code.
How big should a critical section be?
How do we find the minimal critical section?
lock()
/ unlock()
around smallest region containing these.
What if some threads want read-only access to the whole object?
If they're only reading, we don't want to block all other readers. We want to allow N > 1 simultaneous readers OR 1 writer.
Our previous solution blocks readers unnecessarily. We need a new type of mutex that supports both readers and writers.
typedef {
mutext_t m;
int readers;
} rwmutex_t;
If you want a read lock, grab the mutex, add 1 to readers, and return. Decrement readers when unlock.
If you want a write lock, grab the mutex, check if readers == 0, don't release the mutex until done writing.
Problem with spinlocks:
Possible solution:
while( xchg( ... ) )
yield();
However, this stills up some CPU since it hurts context switching time.
Blocking mutexes
typedef struct {
mutex_t m; //controls access to this lock
bool locked; //someone owns this lock
proc_t *blocked_list;
} bmutex_t;
void acquire( bmutex_t *b ) {
for( ; ; ) {
lock( &b -> m );
//make self runnable
//add self to b-> blocked list
if( !b-> locked )
break;
unlock( &b -> m );
schedule();
}
b-> locked = true
//mark self runnable
//remove self from b->blocked list
unlock( &b->m );
}
void release( bmutex_t *b ) {
lock( &b -> m);
b->locked = false;
//mark 1st process in p->blocked list as runnable
unlock(&b->m);
}
Note: two possible methods for which process on the blocked list to mark as runnable:
Acquire : lock--; //p - prolaag
Release: lock++; //v - verhoog
struct pipe {
bmutex_t b;
char buf[N];
size_t r, w;
}
void writec( struct pipe *p, char c ) {
for( ; ; ) {
acquire( &p->b );
if( p->w - p->b != N)
break;
release( &p -> b );
}
p->buf[...]--;
release(&p->buf);
}
However, this still chews up CPU time trying to acquire and release over and over again.
We need:
wait( condvar_t *c, bmutex_t *b )
notify( condvar_t *c )
notify_all( condvar_t *c )
struct pipe {
bmutex_t b;
char buf[N];
size_t r,w;
condvar_t notfull, notempty;
}
void writec( struct pipe *p, char c ) {
acquire( &p-> b);
for( ; ; ) {
if( p-> w - p->r != N ) {
break;
}
wait( &p -> nonfull, &p->b );
}
p->buf[...]--;
release( &p->b );
notify( &p->nonempty );
}