CS111 Scribe Notes
Synchronization Primitives - Lecture 4/28
By: Andrew Look, Andrew Castner, Moises Lee
acastner@cs.ucla.edu
alook@ucla.edu
Synchronization Mechanism
- Race Conditions
- Read-Write coherence
- Large Objects
- T1: strcpy (buf, “Hello)
- T2: strcpy (buf, “world”)
With threads:
- T1 can start writing Hel and then stop
- Then T2 will start writing wo which will over write He and make it wol
- In the end we might end up with something like worlo
- -->multiple stores to set object
- Small Objects
Struct{ unsigned int b1:1; unsigned int b2:1; char buf [10]; } s; T1: s.b1= 0. T2: s.b2 = 0;
Assume that beforehand the bits in the word containing s are all 1's.
Because each assignment is carried out as a [load word], [assign value], [store word] series of instructions (in spite of the fact that the assignments affect individual bits), it's possible that one thread will load the word holding s (currently 0xFFFF) into a register, will be preempted by the second thread which will also load the word into memory (also 0xFFFF), perform its assignment (s.b2 = 0) and store the word (now 0xFF7F, for instance) back into memory.
Thread 1 will then continue, performing its assignment (s.b1 = 0), and then store the result (0xFFDF in this case) at the same spot that thread 2 stored its result. Thus we're left with only thread 1's assignment reflected in memory, which may not have been the desired result.
- On a multiple core machine only one of the bits will be cleared, since the machine instruction operates at only 1 bit at a time
- The actual load, stores, are on word boundaries
- From this point we assume read-write coherence for 32-bit aligned loads and stores.
-->This will perform an unaligned load in which the system will fetch 3 bytes from one word and one bit from the word at 0x1004. Since these two loads aren't simultaneous it's possible that another thread could write to one of the words before we've read it and thus leave us with a value that's incoherent.
We can accomplish synchronization with just this, but it's extremely difficult, so we employ a special machine instruction to help us:
xchg %eax, (%ebx)
The 'xchg' assembly instruction exchanges the values the value in eax with the value at the memory location contained in ebx atomically (i.e. other instructions are executed either before these values have been touched or after they've been swapped. The instruction is known as “test-and-set” in IBM mainframes.
To model this instruction in C:
int tas (int *p, int new){ int old = *p; *p = new; return old; }
All of this is done atomically with just one xchg instruction.
We can use this synchronizing mechanism to construct spinlocks with the following code:
We can see the use of these locks by developing an implementation of Unix's pipe:void lock (mutex_t *m) { // M needs to be a multiple of 4 while (tas (m,1) == 1) // Spin lock, ties the cpu and shouldnt be done on something continue; // Causes a "busy wait" } void unlock (mutex_t *m){ *m=0; } // As a sidenote, m should point to an aligned address // to insure that loads and stores to it are atomic.
enum {N = 1024}; typedef struct { char buf[N]; size_t r,w; } pipe_t;
The next two functions assume there are no synchronization issues:
void write(pipe_t *p, char c) { p-> buf [p->w++%N] = c; } // If w++ overflows the program will still work // because size_t is unsigned and can hold a power of two // larger than 1024. Thus it will roll over to 0 when w % N would. char read (pipe_t *p) { return p->buf[p->r++ %N]; }
Here we attempt to fix the race condition the wrong way:
void write(pipe_t *p, char c) { // While we're loading, someone else is writing while(p->w – p->r == N) continue; p-> buf [p->w++%N] = c; } char read (pipe_t *p) { while(p->w – p->r == 0) continue; return p->buf[p->r++ %N]; }
Use one global mutex for pipes. This is a coarse-grain lock.
mutex_t m; void write (pipe_t *p, char c){ lock(&m); while (p->w – p->r != N){ unlock (&m); lock (&m); } p->buf [p->w++%N]=c; unlock (&m); }
In order to avoid both locking and unlocking twice we can place the above code in a do-while loop:
bool nonfull; void write (pipe_t *p, char c){ do{ lock(&m); nonfull = p->w – p->r != N if (nonfull) p->buf [p->w++%N]=c; unlock (&m); }while (!nonfull); } // The do-while read() primitive is left as an exercise to the reader
- (spinlock)*: we're spinlocking over and over, thus wasting CPU cycles busywaiting.
- Using one global mutex doesn't scale well, as it limits access to all pipes (even if we have 100,000 pipes we can only access one at a time)
To fix this we can put the variable mutex_t in the pipe struct, thus allowing each pipe to be accessed by only one thread at a time. While more complex than the coarse-grained lock we employed earlier, this fine-grained lock will scale much better.
We can make an even more fine-grain lock by adding a separate read lock and a write lock:
mutex_t rm; // Read mutex_t wm; // Write
Can set up access such that any number of threads can read at once, but only one thread may write at any given time and will block readers out until that thread is done writing:
typedef struct { mutex_t m; uint32_t readers; } rwmutex_t; void lock_for_read (rwmutex_t *p) { lock (&p->m); p-> readers ++; //assumes no overflow unlock (&p->m); } void lock_for_write (rwmutex_t *p){ for (;;){ lock (&p->m); if (!p-> readers) return; unlock (&p->m); } } } void unlock_for_read (rwmutex_t *p){ lock (&p->m); p->readers--; unlock (&p->m); } void unlock_for_write (rwmutex_t *p){ unlock (&p->m); }General rule of thumb for finding critical sections
- Writes to shared state
- Reads of nonword-objects (nonatomic reads)
- Dependent reads (using the value of a read to determine a later write)
Spinlocks don't work well if we're going to be doing a lot of waiting, as they tie up the CPU.
- Controls a bounded number of resources (20 buffers, 1 keyboard, etc.)
- Maintain a queue of threads that are waiting for the given resource(s) to become available.
- An int that tells how many of the resource are available
- A pointer to a linked list of process descriptors that are waiting
- A spin lock
Operations on a semaphore: prolaag – try to decrease n. If n is positive we get one. (also called P, acquire, grab, down) verhoog – increase (V, release, up) // We can add a semaphore to our pipe implementation like so: char readc(pipe_t *p){ P( &p->s ); char c = p->buf[p->r++ % N]; V(&p->s); return c; } // This only works if the pipe is nonempty, and thus should probably go in a loop // What would be more helpful here would be something along the lines of: // wait_for(p->w – p->r != 0);
To achieve something like this we'll need a new synchronization mechanism: condition variables
Condition variables serve as representatives of boolean conditions.- A boolean condition
- A blocking mutex to protect the condition
- A condition variable to represent the condition
API for condition variables:
wait(condvar_t *c, bmutex_t *b) // Precondition: B must be acquired. // Action: Releases b then blocks until // another thread notifies us of the condition. // Reacquires b, then returns notify(condvar_t *c) // Action: Tells the OS to wake up one of the threads waiting on this condtion