Scribes: Ken Ohhashi, Colin Fong, Stanway Liau
May 5, 2014
Circular waiting: A process is waiting for a resource held by another process, which is also waiting for the first process to release its own resource.
A higher level process is always waiting on a lower priority process
EXAMPLE-There are 3 levels of priority for threads
*Runs from left to right and downward
|Tlow(runnable) | Tmed(waiting) | Thigh(waiting)
1.|acquire(&r)----->|-context switch->| (runnable)
2.| | | acquire(&r)
3.| | (runnable) | (waiting)
BUFFER
INPUT OUTPUT
requests----->[=======================]---->CPU
100 Gb/s 10 Gb/s
The basic problem we have is that there is more than one thread running at one time. What if we use just one thread? This will solve our problem of data races, but we still want to be able to wait for events. We can use event-driven programming to do this. In this type of programming, there is usually a main loop that listens for events. Below is a simple illustration of the concept:
main program {
while (live) {
wait for some event e; //only thing that waits for an event
e->handler(e); //handlers never wait and must complete quickly
}
}
Upsides:
No locks!
No deadlock!
No bugs due to forgetting to lock!
Downsides:
Handlers are restricted: they may need to be broken up
Cannot scale via multithreading
Although multithreading is out of the question, we can still scale via multiprocess/multisystem instead. Some supporters of event-driven programming claim that the scaling provided by multiprocess makes that of multithreading negligible, thus making that downside irrelevant.
In multithreaded programming, too much CPU time is spent waiting for locks to unlock. To improve the wait time, we can get hardware support through Hardware Lock Elision(HLE)
Below are example Haswell instructions for HLE. Haswell is a codename for a processor microarchitecture, although this information is not critical to understanding the concept.
lock:
movl $1,%eax
try:
xacquire lock xchgl %eax, lock
cmp $0, %eax
jnz try
ret
unlock:
xrelease movl $0,lock
ret
HLE adds two instruction prefixes, xacquire and xrelease, to the architecture. xacquire allows the thread to move ahead "optimistically", executing subsequent instructions, as if it had the lock. The thread modifies its own cache, but doesn’t flush it out to memory. Once it gets to, xrelease will check the value that was traded into the lock, and if the thread didn’t get the lock properly, it will discard all changes in its cache that occurred and act as if none of those instructions happened. If it did get the lock, the cache is then flushed into RAM, since the instructions are valid.
Say we have a General Parallel File System (GPFS) with 120PB (~200,000 hard drives, each 600GB). Here are some performance ideas for such a file system:
Usually, processing long files requires enormous buffers. By distributing segments of the file to multiple devices and accessing each segment concurrently, we can increase the total throughput and process the file quicker.
Metadata contains information about a file like its location, owners, timestamp, etc. Each node in a file system can act as a data node that holds file data, or a meta node that holds metadata. There are no master/slave nodes, it is just a cloud of data and meta nodes.
Suppose a directory in the file system has 10,000,000 files. If the directory is implemented with an internal data structure, instead of , An possible data structure for this is the B+ tree, which is an n-ary tree with an often large number of children per node. Because this limits the depth of the tree, stored data/files can be retrieved efficiently with minimal I/O operations..
This is when a file system is partitioned, the larger partition stays live (can write, read, remove, and add files to it), and the smaller partition is frozen, or read-only.
Sometimes, one can't connect to a server like the SEASNet Linux server or MyUCLA because they are doing maintenance on it. It would be more convenient for users if the server could stay live during maintenance, but it would be hard to implement, since users actions could affect some part of maintenance that could be dangerous for the server/file system.
These are all performance issues that would be implemented in an ideal system, but alas, is difficult to do.
We will analyze these three performance metrics for file systems:
Assume a system with the following characteristics:
/* Busy wait + polling code */
for int(;;) {
char buf[40];
read-40-bytes-from-SSD-into-buf;
compute(buf);
}
/* Batching code */
for (;;) {
char buf[820]; //larger batch
read 840 bytes;
for (i = 0; i < 21; i++)
compute(buf + i*40);
}
/* Avoid busy waiting code */
for (;;) {
do {
block-until-interrupt;
handle-interrupt;
} while (!ready)
}
/* DMA code */
while (1) {
write-command-to-disk;
block-until-interrupt;
check-disk-is-ready;
compute;
}
/* Polling + DMA code */
while (DMA-slots-not-ready)
schedule();