Notes prepared by In Ho Kim, Ho Ching Lam, Davide Lau
Flash ( vs Disk )
Properties about flash
1. Random access is fast : No seeking time is involved with flash
2. Blocks must be erased before writing : Slows down performance
3. Writing wears them out
eg. for a typical NOR drive : a block can be written 100,000 times
for a typical NAND drive : a block can be written 1,000,0000 times
write leveling is used
you don't always write to same block, spread it out
* can be done in firmware or OS
File Systems
Persistant data
structure that organizes & represents files
persistant - survives exits/eraches/power failures
Simplest approach: Represent each file as a contiguous region of disk
Eg.
A | B | file X | ... | file Y | ... |
A -> file X
B -> file Y
A , B : directory
information - fixed size
+
: high performance and predictable
-
: fragmentation (external), fixed allocation(leads to internal fragmentation)
Additional Specification:
Fixed block size on disk : files can be divided
into blocks instead of contiguous region
+ : fixes external fragmentation, fixes some of internal fragmentation
- : directory/data structure is complex
FAT (File Allocation Table) file system
boot sector | Super Block | FAT | ... |
Superblock : holds information on META-DATA
(version,size,blocks used, root dir location, etc)\
FAT : array of block numbers (ith
element of the array holds ith block information)
each element => information about a block :
-1 free
0 EOF
N next block in this file + : No external
fragmentation
- : Files become disorganized, (defragmentation helps, but it is slow and
unreliable)
Requests lots of seeks to read
lseek is slow
finding a free block can be slow ( if disk close to full )
Directory : a file that contains directory entries in a following manner:
Unix file system (1975 ~ 1980): It consist of several levels, here we show the rough hierarchy.Symbolic Link | foo_link->foo | |||||
Absolute path | /tmp/foo (start with “/”) | |||||
File name | ||||||
File name component | ||||||
Inode |
Object that represent table ( In RAM (cache + extra performance) ) ( On Disk ) |
|||||
File system | ||||||
root swap usr student faculty | ||||||
|
||||||
Block on disk | 8192 bytes | |||||
Sector on disk | 512 bytes |
Sample applications concern with Unix file system:
Shred: It is a unix program that overwrite file with random data, then delete it.
“[shred] overwrites devices or files, to help prevent even very expensive hardware from recovering the data. Ordinarily when you remove a file (*note rm invocation::), the data is not actually destroyed. Only the index listing where the file is stored is destroyed, and the storage is made available for reuse. There are undelete utilities that will attempt to reconstruct the index and can bring the file back if the parts were not reused. On a busy system with a nearly-full drive, space can get reused in a few seconds. But there is no way to know for sure. If you have sensitive data, you may want to be sure that recovery is not possible by actually overwriting the file with non-sensitive data. However, even after doing that, it is possible to take the disk back to a laboratory and use a lot of sensitive (and expensive) equipment to look for the faint "echoes" of the original data underneath the overwritten data. If the data has only been overwritten once, it's not even that hard. Quoted from: http://www.phpman.info/index.php/info/shred
|
One simple implementation of shred can be:
open(), write(), write(), write(), close(), unlink().
We overwrite the data 3 three times with random data. This make sure that the data can be easily recovered.
However, in this approach it might not be “safe” if the “write” of the file system writes to a different block and then unlink the original block. So, no matter how many times you “overwrite” the data, you are just writing to somewhere else. NOT the data that you want to earse.
One of the example is “flash”.
Typically, if one wants to “shred” the data, and make sure it is actually overwriting the “data to be earsed”, one should shred the whole drive or the partition that contains those data.
Encryption:
Encrypt data on the disk at all the time. Earsing is fast, just need to earse the password.
Thing we’ve left out is: (we will come back to this in later lecture)
1. RAID: lash together N disks with no single point of failure.
2. File system that span disk.
3. Network file system.
Example layout of a Unix file system:
inodes(index nodes) contain ize of file (from end user's view point) underestimate: actual size due to internal fragmentation. overestimate : case of a file with holes