CS 111 Operating Systems Principles, Fall 2008

Lecture 6 Notes (Tuesday, October 14, 2008)

prepared by Kiana Pourjanfeshan & Mohammad Shahangian

 

File Permissions

To open a file in unix you use the following function:

int open(char const* name, int flags,  mode_t mode)

The mode parameter has to do with permissions.  In Unix there are 12 bits that represent the permissions of files. Generally we ignore the first three bits and are only concerened with the last nine bits. When you use ls  with the -l option you will get something that looks like this:

drwx------ 2 user1 staff  2048 Jan  2 2008  private
drwxrws--- 2 user1 staff  2048 Jan  2 2008  admin
-rw-rw---- 2 user1 staff 12040 Aug 20 2008  admin/userinfo
drwxr-xr-x 3 user1 user   2048 May 13 2008 public

The first field represents the file permissions. A preceding d indicates that the file is a directory. The first 3 bits represent the permissions of the owner of that file, the second three bits represent the permissions of users in the same group as that file and the final 3 bits represent the permissions of anyone else (others). A file that has no permission protection would contain the following file permission representation:

rwxrwxrwx 3 user1 user   2048 May 13 2008 file

Each character in this field represents the presence or absence(indicated by a '-') of a bit. The actual character is does not matter* it just helps the field be more readable and to remind the user that the first bit of each 3-tuple is the read permission bit, the second bit is the write permissions bit and the last bit is the execution permission bit.

 

Since each of these 3 bits can be referred to as a single octal number, the permissions are usually represented by a number such as 777 for the case of the last example where all bits are enabled.

In order to be POSIX-compatible and support future implementations who might perhaps change the ordering of there permissions bits, functions such as open use predefined mode masks that are bitwise OR'ed to the permissions of files. For example, the following 4 masks  represent read for owner, write for owner, read for group and write for group:

660 = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP

*There are 3 bits before the 9 bits we discussed above. These bits are a bit confusing because as we will see they don't all follow the orthogonality convention that we've discussed in the past. Moreover, their functionality varies with the type of file they are applied to. Typically a file permission such as 777 will be written as 0777. The preceding '0' is the octal representation of none of these three bits being set.

The first bit represents setuid. If this bit is set on a file, then the executed process will run with the effective permissions of the owner of that file, not the person who runs it. For example, if a file's permission mod is set to 4770 then if we run ls -l we willl see:

 rwxrws--- 2 user1 staff  2048 Jan  2 2008  File

 The second bit represents setgid.. If this bit is set on a directory, then all files created inside that directory will inherit the group of the parent directory. However, if this bit is set on an executable file, then it will have the same effect as the setuid bit, but this time, the executable will run with the effective permissions of the group which that file belongs to.

The third bit is referred to as the sticky bit. The purpose of the sticky bit is to "hint" at the OS to keep this file in memory as long as possible after it is opened. This could come in handy to increase efficiency if you know ahead of time that a specific file will be accessed repeatedly. Nowadays people have added various functionalities to this bit, such as disallowing deletion of a file by anyone other than the admin. The ls -l command represents a set sticky bit with a 't' following the standard bit representations:

 rwxr-xr-xt 2 user1 staff  2048 Jan  2 2008  File

 

The default permissions of a file created by a user can be controlled by the umask command. This command is a per process feature that restricts the permissions of any file created in a given process. The user may type umask followed by the mask they wish to apply to any file they create. The mask is simply the bit representation of the 9 permissions bits. The mask however specifies which bits will be turned off; that is, the umask is negated and bit-wise AND'ed with the file's permission. The following are a set of common usages:

umask 077 - "Paranoidish": Only the owner gets permissions.

umask 000 - "Trusting": Trust anyone to read, write and execute.

umask 007 - Trust only your group to read, write and execute.

Race Conditions

 

If we run the command "mkdir a/b/c" should the sticky bit be set on the directory? The common convention is that the sticky bit be propagated from the parent. But if we do not want the stick bit we will run the following back-to-back commands:

mkdir a/b/c

chmod -t a/b/c

But it could be possible that between these two commands, a race condition is initiated. That is, the user ran these two commands intending for them to act atomically to create a directory with the sticky bit being disabled. But, another process may access the file and create a subdirectory before the sticky bit is disabled by this command. As a result, the sticky bit will have unintentionally be propagated to this new folder! This can bring about security issues or other inconsistencies.

 

Race conditions can get very hairy. We can see this become a problem when trying to create a temporary file for a process. Any process, such as sort, will need a file to store its contents in during execution to store its results. These temporary files are stored in the /tmp folder. Lets say that the convention is that temp files have the following naming convention. Temp files are numbered from 001 and have preceding st, e.g. st001 would be the first temp file. Each process that needs temp space will scan through the /tmp folder looking for the first integer that isn't used by any other temporary file and create a file with that number. The program will look something like this:

struct_stat st;  //A data type that is returned to indicate whether or not a file exists

for(i=0; ; i++){

    char_buf["MAX_LENGTH"];

sprintf(buf, "/tmp/st%d",i);

if(stat(buf, &st)!=0)   //Check to see if we've found an unused filename

    break;

}

intfd=open(buf, O_RDWR,0666);        //We assume that between the if condition above and execution of this command, no other process has claimed this filename

 

Per the last comment, we can see that there is a potential race condition that would yield an unintended result if we ran the above program at the same time as another program that wants to create a temp file. We can try to use the creat:

int creat(const char *path, mode_t mode)
 

Example:

fd = creat("/tmp/st001",0666);

But creat doesn't solve our problem, if the file already exists, it truncates it. Additionally creat  is more or less doing something that open can, or at least should be doing. What we can do is use the O_CREAT option with the open command. This will be an atomic approach to the above solution.  The following are commonly used flags with open:

O_CREAT= Create file if it doesn't exist

O_TRUNC= Truncate the file if it exists

O_EXCL = Fail if the file already exists. Creates a temp lock on the directory containing the created file.

So our solution to the file creation problem would be.

 

struct_stat st;  //A data type that is returned to indicate whether or not a file exists

for(i=0; ; i++){

   int fd=open(buf, O_RDWR | O_CREAT | O_EXCL, 0666);

          if(fd>-1) break;

}

 

 

One approach to the actual implementation of creating a temporary file is the opentemp(mode_t mode) function. This function would simple avoid all the difficulties of trying to name a file by not naming it. This approach would simple give access to some memory location to the process that needs the temporary file but we will see that it has one big shortcoming that makes it an unpopular approach.

Pros:

OS can decide the location of the file, so that the user app doesn't have to worry about it.

RAMdisk can be used rather than a file.

The space will automatically be freed once the process that created it exits

OS does not have to guarantee persistence to the file which gives a great speed performance boost to the OS.

CON: (BIG minus): We are breaking the big rule of "everything is a file." Now we have 2 different types of files, and a program like du, may no longer be able to tell us why we are low on storage. (second class citizen files)

 

We can lock files in Unix using the following command:

fcntl(fd,

        F_SETLK     //grabs lock

        F_SETLKW  //grab lock, and wait if necessary

        F_GETLK,    //find what locks exist on fd

        &st)

Potential problems with locks:

    A process can loop while locking a file.

    Lock leaks can occur if a problem crashes or forgets to unlock a file.

    Process gets a lock and then it forks

    ** They're purely advisory- this prevents lockout on unsuspecting processes.

  

Different flavors of UNIX handle these cases differently. For example, Unix frees all locks when a program exits and POSIX doesn't give children the locks it's parents have. But the bottom line is, we can not trust other applications to be compliant with sharing locks and therefore, locks are purely advisory.

 

Interrupts and Signals (short introduction):

Applications typically run line by line, and the operating system is responsible for allocating CPU cycles to each of these programs to execute in parallel. But sometimes, it is necessary to interrupt this cycle. For example, in the case of a power failure, the OS will have a few milliseconds (from capacitor power) to prepare for a power failure.

The operating system can be busy scheduling tasks and suddenly receive realize that there is power failure and act accordingly.

Approach A)

One popular method is to store all of the (volatile: will be lost if it loses power) memory onto disk and shut off then resume off this stored image of the ram when it reboots. This approach can be reasonable in some cases, but not all application may be able to resume just based off of the information that was stored in memory at the time of shut down. For example, an application that has established a connection to another computer on the network and is writing files to the other computer.

Approach B)

The computer needs a mechanism to change the flow of each of the applications and warn them of a power failure. This is where signals and interrupts come into play( we will discuss this in the next lecture). Some applications may not change the course of there actions when a power failure is coming on and it may be enough to take the hibernation approach we mentioned above, but other applications can have a handler that reacts appropriately to a power failure. For example, a computer that is writing a file over the network may have a handler that deletes the bits that have been written to the remote pc and logs an error message stating that that copy was unsuccessful.

 

An example of a program that can really make use of handlers would be gzip. Running the command gzip foo creates a compressed file called foo.gz in the following manner:

    1) create an empty file called foo.gz

    2)run through foo and compress the contents and write to foo.gz while running

    3) closes foo.gz

    4) unlinks foo

Again, we are dealing with a non-atomic task that can break between (and even during) instructions. If we experience a power failure, or a CTR^C (exit command) before step 1 or after step 4, we won't really need to do anything special. But between any other two instructions we may create an empty file, a partially compressed file, a file that is never closed, or a lock leak!

So how can we notify the processes?

Method 1)Whenever the OS wants to signal a process it can write to a file in dev/[Signal Name] and each application that cares about interrupts can constantly check the signal file between each of its instructions to see if it needs to change the course of its execution.

    Pros: It works

    Cons: It is inefficient- the application must ask "constantly" when an event has occurred.

Method 2)Break our Virtual Machine applications and allow a "stop hammer" to interrupt the application.

    Pros: It works efficiently (especially when implemented in hardware)  

    Cons:

            System calls and memory instructions are no longer the only inter-process communication portals.

            New model of process where between any 2 machine instructions a signal handling function might change the state and path of applications.

           And last but not least we've opened up a portal to even more race conditions, including race conditions within a single program.