CS111 Lecture 16 Scribe Notes
By: Rishabh Hatgadkar
One more thing that can go wrong with file systems:
Media faults:
They can happen at the sector/block level, at the drive level, or at the server level.
Log-based file systems
Log-based file systems use logs to keep track of changes. It keeps each of these logs on separate drives.
RAID: Redundant Array of Inexpensive Disks
RAID is used because: cost(1TB drive * 10) < cost(10TB drive)
RAID 0:
Uses concatenations of drives.
Example: Ten 1TB drives:
<1TB>
|
|
|
|
|
|
|
|
|
|
<---------------------------------------------10TB-------------------------------->
RAID 1:
Mirroring – Uses two 1TB physical drives to represent 1TB of a virtual drive.
xyz |
xyz |
<--1TB-->
Pros: no single point of failure, faster reads.
Cons: cost two times more, slower writes.
RAID 4:
Combination of concatenation and mirroring.
A |
B |
C |
D |
E |
Parity drive |
Parity drive = A^B^C^D^E
If A, B, C, D, or E is lost, parity drive can be used to recover it.
For example: C is lost → C = A^B^D^E^(A^B^C^D^E)
Reads are as fast as RAID 0. But writes are slow.
Pros: more space efficient than RAID 1
Cons: not striped (can't read blocks of a file in parallel), parity disk can be bottlenecked.
RAID 5:
Striping with parity drives.
Parity drive |
|
|
|
|
|
|
Parity drive |
|
|
|
|
|
|
Parity drive |
|
|
|
|
|
|
Parity drive |
|
|
|
|
|
|
Parity drive |
|
|
|
|
|
|
Parity drive |
Pros: parity is not a bottleneck
Cons: more complex
RAID 4 is easier to maintain than RAID 5.
Mars mission scenario:
RAID 1 is more reliable than RAID 4, because parity drive is a bottleneck for writes.
Performance and scheduling
Disk scheduling
Want: high throughput
no starvation (low latency) – want data to come quickly
Simple model for hard disks
0 |
|
h |
Disk blocks |
r |
|
N-1 |
Cost: |h – r|
First come first serve (FCFS)
No starvation. Average cost is 1/3.
Shortest seek time first (SSTF)
Maximizes throughput. Minimizes latency. But there is starvation.
SSTF + FCFS
Break up the trace (sequence of I/O requests) into chunks. Chunks are FCFS. Within a chunk you use SSTF.
1 |
3 |
5 |
2 |
4 |
|
|
|
|
|
|
|
|
|
|
<--------chunk----------> <------chunk------------> <----------chunk------->
Elevator scheduling
One way SSTF. No starvation. Low throughput.
Circular elevator scheduling is fairer. Example: One Boelter Hall elevator goes up. Another elevator goes down.
Hard modularity
Virtualization – relies on secure transfer of control.
Distributed system – relies on secure communications.
Procedure calls (soft modularity):
1. system calls with virtualization
2. RPC (remote procedure calls)
RPCs differ from ordinary system calls
Caller and callee don't share same address space (different machines).
Pros: hard modularity
Cons: no call by reference (large objects an be slow)
Caller could be x86, callee could be x86-64. Caller needs to marshal the data. Needs to convert it to a standard form so that different machine architectures will be able to interpret the data.
Failure modes are different for RPC
Possible failures: trashing of data, lost messages, corrupted messages, slow or busy network, slow or busy client or server.
Possible solutions: Use your own check sum if message is corrupted. Resend message after a timeout or failure.
For read() system call, resending makes sense. For write() related system calls, failing makes sense.
Exactly once RPC is impossible because need coordination between client and server. Run into circular loop. This is impossible in some realistic cases.
Performance
X Protocol – Run the client separate from the server. It is slow. Drawing a program, for instance, will require you to individually draw each pixel. Batching helps to improve performance by allowing for asynchronous calls. You can send a command to the server to draw multiple pixels at once.