CS111 - Winter 2008
Lecture 14 (2/20/2008)

Notes prepared by In Ho Kim, Ho Ching Lam, Davide Lau

 


Flash ( vs Disk )

Properties about flash

                    1. Random access is fast : No seeking time is involved with flash

                    2. Blocks must be erased before writing : Slows down performance

                    3. Writing wears them out

                    eg. for a typical NOR drive : a block can be written 100,000 times
                          for a typical NAND drive : a block can be written 1,000,0000 times

                        write leveling is used
                                         you don't always write to same block, spread it out
                                        * can be done in firmware or OS


File Systems

Persistant data structure that organizes & represents files
     persistant - survives exits/eraches/power failures

Simplest approach: Represent each file as a contiguous region of disk

Eg.

A B file X ... file Y ...

A -> file X
B -> file Y

     A , B : directory information - fixed size
               + : high performance and predictable
                - : fragmentation (external), fixed allocation(leads to internal fragmentation)

 Additional Specification:
      Fixed block size on disk : files can be divided into blocks instead of contiguous region
               + : fixes external fragmentation, fixes some of internal fragmentation
                - : directory/data structure is complex

     


     

FAT (File Allocation Table) file system

boot sector Super Block FAT ...

Superblock : holds information on META-DATA
                  (version,size,blocks used, root dir location, etc)\

FAT  : array of block numbers (ith element of the array holds ith block information)  each element => information about a block :
                                      -1  free
                                       0  EOF
                                      N  next block in this file
 + : No external fragmentation

                    - : Files become disorganized, (defragmentation helps, but it is slow and unreliable)
                        Requests lots of seeks to read
                        lseek is slow
                        finding a free block can be slow ( if disk close to full )

Directory : a file that contains directory entries in a following manner:

Unix file system (1975 ~ 1980): It consist of several levels, here we show the rough hierarchy.
Symbolic Link foo_link->foo
Absolute path /tmp/foo  (start with “/”)
File name  
File name component  
Inode

Object that represent table

( In RAM (cache + extra performance) )

( On Disk )
File system  
   
    root       swap      usr          student   faculty
 

|

 

 

 

 

Block on disk 8192 bytes
Sector on disk 512 bytes

 


 

Sample applications concern with Unix file system:  

Shred:  It is a unix program that overwrite file with random data, then delete it.

“[shred] overwrites devices or files, to help prevent even very

expensive hardware from recovering the data.

  Ordinarily when you remove a file (*note rm invocation::), the data is not actually destroyed.  Only the index listing where the file is stored is destroyed, and the storage is made available for reuse. There are undelete utilities that will attempt to reconstruct the index and can bring the file back if the parts were not reused.

  On a busy system with a nearly-full drive, space can get reused in a few seconds.  But there is no way to know for sure.  If you have sensitive data, you may want to be sure that recovery is not possible by actually overwriting the file with non-sensitive data.

  However, even after doing that, it is possible to take the disk back to a laboratory and use a lot of sensitive (and expensive) equipment to look for the faint "echoes" of the original data underneath the overwritten data.  If the data has only been overwritten once, it's not even that hard.

 Quoted from: http://www.phpman.info/index.php/info/shred

 

One simple implementation of shred can be:

            open(), write(), write(), write(), close(), unlink().

We overwrite the data 3 three times with random data. This make sure that the data can be easily recovered.

 However, in this approach it might not be “safe” if the “write” of the file system writes to a different block and then unlink the original block. So, no matter how many times you “overwrite” the data, you are just writing to somewhere else. NOT the data that you want to earse.

            One of the example is “flash”.

Typically, if one wants to “shred” the data, and make sure it is actually overwriting the “data to be earsed”, one should shred the whole drive or the partition that contains those data.

Encryption:

Encrypt data on the disk at all the time. Earsing is fast, just need to earse the password.

Thing we’ve left out is: (we will come back to this in later lecture)

1.      RAID: lash together N disks with no single point of failure.

2.      File system that span disk.

3.      Network file system.


Example layout of a Unix file system:

inodes(index nodes)

contain ize of file
	(from end user's view point)
	underestimate: actual size due to internal fragmentation.
	overestimate : case of a file with holes