CS111 Lecture 16 Scribe Notes

By: Rishabh Hatgadkar


One more thing that can go wrong with file systems:


Media faults:

They can happen at the sector/block level, at the drive level, or at the server level.


Log-based file systems


Log-based file systems use logs to keep track of changes. It keeps each of these logs on separate drives.


RAID: Redundant Array of Inexpensive Disks


RAID is used because: cost(1TB drive * 10) < cost(10TB drive)


RAID 0:


Uses concatenations of drives.

Example: Ten 1TB drives:


<1TB>











<---------------------------------------------10TB-------------------------------->



RAID 1:


Mirroring – Uses two 1TB physical drives to represent 1TB of a virtual drive.

xyz

xyz

<--1TB-->


Pros: no single point of failure, faster reads.

Cons: cost two times more, slower writes.


RAID 4:


Combination of concatenation and mirroring.


A

B

C

D

E

Parity drive


Parity drive = A^B^C^D^E

If A, B, C, D, or E is lost, parity drive can be used to recover it.

For example: C is lost → C = A^B^D^E^(A^B^C^D^E)


Reads are as fast as RAID 0. But writes are slow.


Pros: more space efficient than RAID 1

Cons: not striped (can't read blocks of a file in parallel), parity disk can be bottlenecked.


RAID 5:


Striping with parity drives.


Parity drive







Parity drive







Parity drive







Parity drive







Parity drive







Parity drive


Pros: parity is not a bottleneck

Cons: more complex


RAID 4 is easier to maintain than RAID 5.


Mars mission scenario:


RAID 1 is more reliable than RAID 4, because parity drive is a bottleneck for writes.


Performance and scheduling


Disk scheduling


Want: high throughput

no starvation (low latency) – want data to come quickly


Simple model for hard disks


0


h

Disk blocks

r


N-1


Cost: |h – r|


First come first serve (FCFS)


No starvation. Average cost is 1/3.



Shortest seek time first (SSTF)


Maximizes throughput. Minimizes latency. But there is starvation.


SSTF + FCFS


Break up the trace (sequence of I/O requests) into chunks. Chunks are FCFS. Within a chunk you use SSTF.


1

3

5

2

4











<--------chunk----------> <------chunk------------> <----------chunk------->


Elevator scheduling


One way SSTF. No starvation. Low throughput.

Circular elevator scheduling is fairer. Example: One Boelter Hall elevator goes up. Another elevator goes down.


Hard modularity


Virtualization – relies on secure transfer of control.

Distributed system – relies on secure communications.


Procedure calls (soft modularity):

1. system calls with virtualization

2. RPC (remote procedure calls)


RPCs differ from ordinary system calls


Caller and callee don't share same address space (different machines).

Pros: hard modularity

Cons: no call by reference (large objects an be slow)


Caller could be x86, callee could be x86-64. Caller needs to marshal the data. Needs to convert it to a standard form so that different machine architectures will be able to interpret the data.


Failure modes are different for RPC


Possible failures: trashing of data, lost messages, corrupted messages, slow or busy network, slow or busy client or server.


Possible solutions: Use your own check sum if message is corrupted. Resend message after a timeout or failure.


For read() system call, resending makes sense. For write() related system calls, failing makes sense.


Exactly once RPC is impossible because need coordination between client and server. Run into circular loop. This is impossible in some realistic cases.


Performance


X Protocol – Run the client separate from the server. It is slow. Drawing a program, for instance, will require you to individually draw each pixel. Batching helps to improve performance by allowing for asynchronous calls. You can send a command to the server to draw multiple pixels at once.