File System

Boyang Cai, Haoyuan Wu, Wenhao Yu, My Phuong Mac

Table of Content

 

Key Performance for File System:

 

Common disk specification:

 

 

Seagate HDD

Corsair SSD

Capacity

1 TB

60 GB

Cache

16 MB

RPM

7200

Internal transfer rate

1.29Gb/s

External transfer rate

3Gb/s

3Gb/s

Sustained Transfer rate

0.93 Gb/s

280MB(read)\ 280MB(write)

 

Rotational Latency

4.166ms

Seek time(read)

8.5ms

Seek time(write)

9.5ms

Track-to-track time

0.8ms

Power (idle/typical)

9.5W/12.5W

2.0 W

MTBF

1 million hours

1 million hours

Annual failure rate

0.73%

Non-recoverable read errors

1 per 10E15 bits read

Shock Resistance

1000G

Procedures to input/out data from disk:

*:main latency source.

 

Example of different I/O strategies:

Assume, 1 CPU cycle time: 1ns;  Programmed I/O takes 1000 CPU cycles : 1 microsecond; Device latency: 50 microsecond; Computation time per buffer: 5 microsecond.

Polling

Straigth forward away

Latency

5ms + 40ms + 50ms + 5ms = 100ms

for(;;){
  char buf[40];
  read 40 bytes to buf;
  compute(buf);
}

request + device latency + read + compute

Throughput

10,000/s

1 request / time on process

Utilization

5ms / 100ms = 5%

Compute time / overall time

 

Batching

Read more data per reading

Latency

5ms + 40ms * 21 + 50ms + 5ms * 21 = 100ms

int batchNum = 21
for(;;){
  char buf[40*batchNum];
  read 40 bytes to buf;
  compute(buf);}

request + device latency + read + compute

Throughput

21/1000ms = 21,000/s

21 requests / time on process

Utilization

105ms / 1000ms = 10.5%

Compute time / overall time

 

Interrupts

Let CPU doing other task while waiting for device

Latency

5ms + 50ms + 5ms + 1ms + 40ms + 5ms = 106ms

for(;;){
  send request;
  do{
    wait for interrupt;
    handle interrupt;
  }while(!ready);
  read;
  compute;}

request + wait for interrupt + overhead of interrupt + check ready + read + compute

Throughput

1/56 ms = 17,857/s

1 request / time on prcess

Utilization

5ms / 56 ms = 8.9%

Compute time / time on process

 

DMA

Direct memory access, Controller send data to memory directly

Latency

50ms + 5ms + 1ms + 5ms = 61ms

for(;;){
  issue a command to device
  block until interrupt;
  handle interrupt;
  check if device is ready;
  compute;}

block + interrupt handler + check + compute

Throughput

1 / 11ms = 910,000/s

1/ time on process

Utilization

5ms / 11ms = 45%

Compute time / overall time

 

DMA + Polling

Let CPU doing some while waiting for DMA ready.

Latency

50ms + 1ms + 5ms = 56ms

for(;;){
  issue a command to device
  yield();
  check if device is ready;
  compute;}

block  + check + compute

Throughput

1 / 61ms = 167,000/s

1/ time on process

Utilization

5ms / 6ms = 84%

Compute time / overall time

Structure Example:

120 PB of data is stored on 200,000 hard drivers. Each disk contains 600GB data.

Image Error

Each disk controller(DC) will be connected to the bus, and each DC will control certain disks.
This structure is too slow, so what we need is a specialized network to connect multiple CPUs, Disk Controllers, and Disks.

Image Error

General Parallel File System (GPFS)

  1. Stripes

    Disk 1

    Disk 2

    Disk 3

    Disk 4

    Disk 1

    Disk 2 …

    One big file is stored in different disks
  2. Distributed metadata
    No central catalog
  3. Efficient directory indexing
    directory is a big directory containing millions of files.
  4. distributed locking (harder)
  5. partition awareness.
    Race condition rises when there is partition.
  6. file system stays live during maintenance.