CS 111 Spring 2007 - Scribe Notes - Neal Boroumand & Brandy

RAID and Reliability

Typical computers exhibit the following behavior

----------up---------^^^Crash^^^--------------up-----------^^^Crash^^^-------------up----------

We can calculate several parameters from the preceeding behavior:

Mean Time to Failure : Avg "up" time

Mean Time to Repair : Avg "crash" time

Mean Time between Failures = MTTF + MTTR

The availability of the system can be calculated by: MTTF/MTBF = .99999 (five nine system)

Reliability of Disks:

Typical disk

P(Fail)

|...............................................................................................xxxxxxx

|.............................................................................................xxxxxxxx

|x.......................................................................................xxxxxxxxxx

|xxx.............................................................................xxxxxxxxxxxxxx

|xxxxx............................................................xxxxxxxxxxxxxxxxxxxxx

|xxxxxxx........................................xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Time (year) 1--------2--------3--------4---------5--------6-------7-------------->

The spike in the beginning is known as the "burn in" period. After 5 years, the odds of failure become high very quickly, and failure is a likely event.

There's really no way to predict when a hard drive will fail, so how do we avoid "crashes"?

They key is in redundancy. Different RAID configurations offer different levels of redundancy, and thus reliability.

RAID 1: Mirroring

This version keeps a duplicate version of the hard drive at all times, thus allowing the system to function normally should one of it's hard drives fail. If both fail, the system will of course crash. However, the assumption is that the first failed H.D.D will be replaced in a quick enough time to avoid experiencing dual failures.

The reliability of RAID 1 relies heavily on the response time of drive replacement. If the drives aren't replaced, the reliability looks similar to that of a single hard drive implementation, except with a sharper slope at the end:

P(fail)

|..................................................................................................xxxxxxx

|................................................................................................xxxxxxxx

|..............................................................................................xxxxxxxxx

|.........................................................................................xxxxxxxxxxxx

|xx..............................................................................xxxxxxxxxxxxxxxx

|xxxx...............................................................xxxxxxxxxxxxxxxxxxxxxx

|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Time (year) 1--------2--------3--------4---------5--------6-------7-------------->

RAID 4: Multiple Unique Hard drives with 1 Parity Hard drive

This implementation uses N-1 hard drives as regular standalone hard drives, each with their own data. The redundancy comes from the Nth hard drive, which is actually a parity hard drive. It stores no useful user data, but rather ensures the validity of the remaining N-1 hard drives. In the case of a single hard drive failure, the parity drive can be used to calculate the contents of the failed hard drive and restore the disk. The downside is that there is still a "downtime" involved in this configuration, because for a (hopefully) short period of time, the system will be midding one of it's hard drives. The benefit is, no data will be lost in the case of 1 failure, and only 1 hard drive of overhead is needed, insteaed of the 1:1 ratio of Raid 1.

RAID 0 : Data Striping across multiple drives

While not technically a RAID configuration, RAID 0 offers increased performance over regular hard drive configuration. It utilizes multiple hard drives to store contiguous blocks of data. This allows it to use multiple read/write heads simultaneously, thus increasing throughput of the system. The downside is reliability goes down greatly, since now the system relies on ALL drives simultaneously, and a single failure will deem the system unusable.

Virtual Memory - How does it work?

All processes have an address space visible to them that we call the "virtrtual address". Each process can treat these addresses like their own private block of contiguous memory allocated to them.

In reality, these memory addresses have no physical meaning, but are simply a layer of abstraction between the processes and the operating system. The OS has something called a VMM (virtual memory manager), which allows it to communicate across this layer of abstraction.

So how is the VMM implemented? The simplest way is to keep a linear page table, which maps virtual pages of memory to physical pages of memory. The "page", however, is not the entire address, but rather a pre-specified subset of the address who size depends on the page size. Say we declare each page to be 4kb (size of our OSP hard drive block). This is 12 bits. In order to fill up the entire address space, we will then need 2^(32 - 12) pages. Our virtual addresses can then be organized into the following form:

|Page Number | Page Offset|

|----20 bits-----|-----12 bits---|

While this sounds resonable, it requires that our page table have 2^20 entries, each one being 4 bytes, which means the size of the page table will be 2^22 bytes = 4MiB. This may not sound too large, but each process needs its own page table, regardless of it's size, which will produce a lot of unneeded overhead. (64-bit machines will need 2^54 bytes for the page table!).

Most programs don't even need the entire 4GiB of virtual address space anyway, so to save space we can truncate the page table and create 2 "levels". The new format now looks like:

|---hi---|---low---|---offset---|

So now we can first search the page table for the "hi" address, and find the corresponding point in memory. Then use the "low" offset of this address to find the actual page's address in memory. Finally, we can use the "offset" of this page to get the data.

Here is a function that could do this sort of thing:

int pmap(int vpn){

int lo = vpn & (1 << 10) -1);

int hi = vpn >> 10;

int j = PAGE_TABLE[hi];

if (j == fault)

return FAULT;

else

return j[10];

}

We will define %cr3 as the address of the page table. In order for this to work, we must ensure that %cr3 is invisible to ordinary apps. We must also protect the page table's contents and make sure no process writes to it. If a process ever asks for an address that returns FAULT, an exception must be raised, and the kernel must allocate the extra memory for the new page. This allows us to save space by only allocating the amount needed by the program, and not assuming they each need the entire 4MiB.

Potential Problem with Virtual Memory

By giving each process up to 4GiB of memory addresses, we are running the risk that processes will attempt to use more memory than than the system actually has! To avoid this, we need to use the hard drive to reserve a "swap" space for memory that cannot fit into RAM.

When a page fault occurs, we take the following steps:

1) Pick a victim page in RAM

2) Write the page out to disk, where it "belongs"

3) Write desired page to RAM

4) Restart faulted instruction