CS 111 - Lecture 10 Scribe Notes - Spring 2012

Written by Matthew Clark based on a lecture by Paul Eggert

Synchronization

Mutex
```
struct pipe{
	mutex_t l;
	...
};

void writec(struct pipe*p, char c){
	lock(&p->l);
	[write a byte]
	unlock(&p->l);
}
```
Suppose that it takes a long time to write a byte, then all other processes are caught up busy waiting for the lock to be free.

To solve this issue of processes being busy waiting for a lock, let's try blocking mutexes...
Blocking Mutex
```
struct pipe{
	bmutex_t b;
	...
};
void writec(struct pipe*p, char c){
	acquire(&p->b);
	[write a byte]
	release(&p->b);
}
```
Internally the lock for bmutex_t yields if someone else has lock. This means that even the blocking mutexes are busy waiting because the CPU has to check through the queue of processes - this chews up the CPU.

We need a way to tell scheduler that it a process is waiting and what that process is waiting for, so it isn't constantly being checked...
Condition Variable

Need to tell kernel:
- "I'm blocked by an object O" AND
- "Waiting for event condition C"
For it to work you need:
- To know what it stands for (in your head) and a condition defining the variable
- A blocking mutex that protects the condition
- A condition variable standing for this condition
An example pipe structing implemented with condition variables:
```
struct pipe{
	bmutex_t b;
	condvar_t nonfull;  //w-r < N which is true if pipe not full
	condvar_t nonempty;  //true if pipe not empty
	size_t r, w;
	char buf[N];
};
```
Condition Variable API
- ```
wait(condvar_t *c, bmutex_t b)
```
  Precondition: the blocking mutex b is acquired.
  
  Internally, this function releases b, waits, and then reacquires b.
- ```
notify(condvar_t *c)
```
  Notifies a process waiting on a condition c to be true that the condition is now true.
- ```
notifyAll(condvar_t *c)
```
  Similar to notify except that it awakens all processes instead of just one.
Example write function implemented using condition variables:
```
void write(struct pipe *p, char c){
	acquire(&p->b);
	while(p->w - p->r == N)
		wait(&p->nonfull, &p->b);
	/*do the write*/
	notify(&p->nonempty);
	release(&p->b);
}
```
If the pipe is full, the process will wait until the pipe is not full (the condition in wait(&p->nonfull, &p->b)). The function will also notify the next process that the pipe is not empty after a write is made (notify(&p->nonempty)).
Classic Synchronization

Semaphore (1968)

Implements blocking mutex with int not bool.

Locked when counter = 0. Unlcked when counter > 0.
- P = down (acquire)
  
  "prolaag" = "probeer te verlagen = try and decrease
- V = up (release)
  
  "verhoog" = increase

Deadlock

Deadlock is a race condition!
- Hard to debug
- Rare
- Can be disastrous (We all want to keep our jobs right?)
4 conditions that can cause deadlock:
1. Circular wait
2. Mutual exclusion
  
  Need objects which allow only one process to access at one time.
3. No preemption of locks
  
  Nothing can force another process to give up lock.
4. Hold and wait
  
  Have ability to hold object and try to get another.
Strategies for avoiding deadlock:
- Write perfect code (a bit impractical!)
- Threads hold at most one lock
  
  Downside: no balance transfers!
- Lock preemption
  
  Downside: complicates code for process that got booted.
- Acquire function checks for circular wait, fails if found
  
  Downside: slows down acquire function, complicates user code.
- Lock ordering
  
  If you need to acquire more than one lock, always acquire locks in numeric order (perhaps order could be determined by checking addresses)
  
  Example of acquiring multiple locks using try_acquire:
```
for(i = 0; i < 5; i++){
	/*try_acquire will fail if it couldn't acquire lock*/
	if(try_acquire(&l[i]) < 0){
		for(j = 0; j < i; j++)
			release(&l[j]);
		i++;
	}
}
```

File System Performance

Introduction

There was a time when disks were faster than CPUs. This caused file systems to be implemented with simple code. Nowadays, with CPU speeds far greater than disk speeds, we can have tricky file systems because CPUs can handle non-trivial code.

Sample Devices

Barracuda XT 3TB ($200)		Corsair Force 60GB SSD ($150)
Cache	64 MB	Data transfer rate	3 Gb/s
# of sectors	3,907,029,168	Average operating power	2.0 W
Spin speed	7200 rpm = 120 Hz	Average idle power	0.5 W
Data transfer rate	6 Gb/s	Sustained data transfer rate (write)	270 MB/s
Sustained data transfer rate	138 MB/s	Sustained data transfer rate (read)	280 MB/s
I/O data transfer rate	600 Mb/s	IOPS (I/O per sec)	50K
Average latency	4.16 ms
Random read seek time	8.5 ms
Random write seek time	9.5 ms
Average operating power	9.23 W
Average idle power	6.39 W

*Data transfer rate = how fast data can be transferred out of the cache.

*Average latency = assumes arm is in the correct spot.

*Random read seek time = assumes arm is in incorrect spot.

*Random write seek time = avoids overwriting, so it takes a little longer than reading.

**Compare the SSD's 50K IOPS vs. the hard-drive's 70 approx. IOPS.

Summary: SSD is fast, but small and pricey. Hard disk may be slower but way larger and cheaper per GB.

I/O Performance Issues

(strategies to get I/O from disk faster)

BUFFER

Reading a single character from the disk

read(fd, &c, 1);

Average read seek	+	8.5 ms
Latency	+	4.16 ms
Out of cache	+	~0.01 ms
Into cache off disk	+	~0.04 ms
CPU overhead	+	~0.01 ms
TOTAL	+	13.72 ms

**This translates to reading characters about 70 bytes/s. Not very impressive.

Reading using a buffer (1KB)
```
char buf[1024];
read(fd,buf,1024);
```
**This turns out to read characters about 70 KB/s. Improved by 1000 times!

Reading using an even larger buffer (1GB)

char buf[1024*1024*1024];
read(fd,buf,1024*1024*1024);

**This turns out to read characters about 130 MB/s. Improved by 2000 times!

Why not always just read with 1 GB buffer?

Most of the time it would be reading more than it needs to.
It takes A LOT more time to move the data into cache. (7000 ms!)

Average read seek	+	8.5 ms
Latency	+	4.16 ms
Out of cache	+	~0.01 ms
Into cache off disk	+	~7000 ms
CPU overhead	+	~5 ms
TOTAL	+	7017 ms

BATCHING

Do things a block at a time.
User sees this: read(fd, &c, 1);
OS does: sys_read(disk, buf, 1024*1024*1024); and cache recent read.

SPECULATION

Guess that the app will continue to read.
This means the system will read more than what is requested and then cache the remainder. This is called PREFETCHING.

An important thing to note is that as throughput increases, latency decreases:

70 bytes/s 13.72 ms

70 KB/s 13.72 ms

130 MB/s 7017 ms

So there must be a balance between the size of the read and the length of time it takes to read.

DMA (Direct Memory Access)

Won't really notice a difference with raw reads.
DMA affects time off disk and CPU overhead.
What if we have lots of disks?
- DMA becomes large improvement
Normally the disk controller transfers data then causes an interrupt, so CPU causes next I/O request.
DMA with no interrupts: CPU polls device, which can actually reduce latency!

Disk Scheduling
- Consider lots of accesses to the same disk from multiple processes.
- Must figure out an order to give access.
- Want: high throughput, low latency, fairness.
- We can model disks as a linear array:
  
  Cost of accessing block b is |b-h|.

70 bytes/s	13.72 ms
70 KB/s	13.72 ms
130 MB/s	7017 ms

CS 111 - Lecture 10 Scribe Notes - Spring 2012

Written by Matthew Clark based on a lecture by Paul Eggert

Synchronization

Mutex

Blocking Mutex

Condition Variable

Condition Variable API

Classic Synchronization

Semaphore (1968)