CS 111

Scribe Notes

by Vladimir Vysotsky

Security concerns can be separated into two broad categories - authentication and authorization. Authentication deals with the ability of the security system to find the identity of the principal. Authorization deals with the ability of the system to, after verifying the identity, deciding what the principal is allowed or not allowed to do.

Authentication

Authentication deals with finding the identity of the principle. This identification can use one of three properties of the principal.

Who the principle is (retina scan, fingerprint)
What the principal has (physical key)
What the principal knows (password)

There is a bootstrapping issue that plagues all of these methods. The establishment of any new method of proving one's identity must rely on an earlier knowledge of that identity. As an example, to acquire a SEASnet password, one needs to have a UCLA ID, to prove to the SEASnet administrators that they are in fact a student. If this ID could be faked, someone can wrongly receive SEAS authentication.

Authentication protocols are generally separated into two degrees of strictness - external and internal authentication.

External Authentication

External authentication generally uses one of the three methods mentioned above. It is strict, and slow, but as a result is much more reliable in establishing identity. However, it still has some vulnerabilities:

Attack	Defense
Dictionary attack	Limit amount of guesses
Social engineering	Physical, human-verified identification
Leaks of password files	Encrypt as salted hash
Evesdroppers on networks	Encrypt client/server communication
Fraudulent server	Server authenticates itself using certificate, or boostrap from previous exchange

Clarification: Salted hashes are a method of encrypting passwords that makes it very difficult to decrypt an entire database. Every password has a "salt", a string of random numbers, appended to it, and the result is put through a one-directional hash. This means that to decrypt a whole database, an attacker cannot use a pre-computed table of passwords and their hashes, but must compute such a table for every individual salt.

Internal Authentication

Internal authentication is a weaker form, used in situations where the principal is in a secure environment, but their identity must be further verified for another purpose. A prime example of this is the uid and gid in a Linux system. In the process table, there are two extra elements in the process struct:

                uid  gid
                 |    |
         ________v____v________________
        |_____|____|____|______________|
                  \ /
                   +-> consulted for access decisions

These elements are used to authenticate an individual process as a certain group and user, and will be used in the authorization step to give the process access to certain actions.

Note: In Linux systems, there is a particular vulnerability that comes about when one tries to design a uid/gid system. Namely, processes occasionally need to change these values. This is done with the setuid and setgid functions. However, processes must not be allowed to simply authenticate as whoever they want, so this call is only allowed when the uid is currently 0.

To allow programs to change these values, a utility named su was made, which has an extra bit in the filesystem - the setuid bit - so that it always runs as root. This allows it to always change the uid of the current program. It also makes it a prime vector for exploits, so various protections are put over it. For instance, one could run cp /bin/sh /bin/su, and obtain a root shell. To counter this, the setuid bit is made volatile, so modifying the file automatically unsets it, and it must be reset by a root user. Another exploit would move the su binary, and copy a malicious one in its place. A user would then run the malicious su binary, thinking it does authentication. This is solved by making the /bin directory unwritable (or, in special circumstances, unreadable) to non-root users.

Authorization

Authorization is used to grant an authenticated principal permission to do certain actions, e.g. modify files or use a piece of hardware. One way to think about authorization is as a big 3-dimensional array.

                         +---------------+
                        /               /|
                       /               / |
                      /               /  |
                     /               /   |
                    +---------------+    |
                    |               |    |
                   ^|* big array of |    |
                   ||   permissions |    + ^
         principal ||               |   / /
                   ||* need compact |  / /
                   ||  and fast way | / / actions
                   ||  to store it  |/ /
                    +---------------+
                        ------>
                        resources

On *nix systems, the standard way to store permissions is by setting separate permissions for the user, group, and everybody else. These are visible on the filesystem as the three values in a files' permission, and they also exist in the kernel's process array. However, this system has some problems, which become clear whenever any more complex permission giving is necessary. For example, consider if Prof. Eggert made a directory full of student assignments, and wanted to restrict access to the TAs. The TAs have different users and belong to different groups, so the only way to include them all would be to create a new group just for the TAs, and make the directory belong to it. This is a hassle, and always requires root privelege to make new permissions.

Access Control Lists (ACLs)

The solution to this is access control lists. They are an extensible way to add priveleges to extra uids not covered in the standard system. The inodes for files in such a filesystem look like this:

              _____
             | ... |
             |-----|                         ACL
             | uid |          -----------------------------------
             |-----|      -->| A | rwx | B | rwx | C | rwx | ... |
             | gid |     /    -----------------------------------
             |-----|    /
             | mod |   /
             |-----|  /
             |     |_/
             |_____|
             | ... |
              -----

There is a uid, gid, and mode (the standard permission system), and an extra extensible field to store any extra permissions.

Catch: Access control lists work well, but they identify every user by a single number (the uid). As a result, the longer the user is on a system, the more permissions they will aggregate. As they have more power, buggy programs running under that user will have a larger attack surface, even if the attack is not intentionally malicious. A possible solution to this is role-based access control. It allows users to take on individual roles, and programs will then be running under both the user and the specific role. This limits the power of an individual program - for example, Professor Eggert could have the roles of a grader, a course designer, and a software developer. A program started under the software dev role could not accidentally modify a students' grades or change the curriculum of a class.

Capabilities

Yet another authorization system would throw out the centralization altogether. By encrypting pointers to objects (or, more accurately for Linux, controlling which processes have which file descriptors), it is possible for programs to inherit or give capabilities to other programs. A good example for this comes from bash. Imagine the command (chmod 444 nf; echo hello) > nf. The order of execution is as follows - first, the file nf is created in the filesystem by bash. Then the subshell inherits the file descriptor, with write permission. Inside the subshell, the file's permissions on the disk are changed to r-x. However, the subshell itself still has access to a writable file descriptor, which all programs in the subshell inherit, include echo. Thus, echo succesfully writes hello into nf, despite it being non-writable in the filesystem.

While these file descriptors are still handled by the kernel, the use becomes clear in networked applications, when such a system wouldn't work - there is not a single kernel running. A client could, after going through an authentication step, be given capabilities by the server that only it has, effectively giving it write access to a remote file without the kernel's involvement.

             ---------network--------
            |                       |
            |    -authentication-   | 
            |   /                \  |
             --/------------------\-
              /                    \           
             /                      \          
            /                        \         
         client                  File Server   
     ---------------                 |         
    | file 23794861 |            ------------    
     ---------------            | local      |
                                | file       |
                                | descriptor |
                                 ------------

Note: There are a few issues with this approach. File descriptors managed by the kernel are private to a process and its children, so it would be much more difficult for a process to gain access to another file. It could not another process's file descriptor. However, if a file server is managing access, a malicious client could spoof the identification used to access the file. The solutions to this are to find methods to contain this shared secret by encrypting it, use a longer secret so it cannot be guessed, and have the file server identify the client/file descriptor pair, instead of granting access to anyone with the right file descriptor.

Trusting Software

How to establish trust for your own system?

The trust bootstrapping issue also extends to the user, trusting the system when they input private information. This is a large issue, and one of the best examples of how complex it can become is Ken Thompson's example of a system exploited at the deepest layers, in "Reflections on Trusting Trust".

The attack is a backdoor on linux systems. The source for the login program roughly has the following form.

      check_password();
      if (succesful())
      {
        setuid(user);
      }

His basic exploit added an extra condition.

      check_password();
      if (succesful())
        setuid(user);
      if (username=="ken")
        setuid(0);

However, this code would quickly be caught, since the entire Linux source is open for anyone to view. The solution would be to modify the code of the compiler.

      parse_code();
      generate_assembly();

Would be changed to.

      parse_code();
      if (filename == "login.c")
        insert_exploit();
      generate_assembly();

Thus, the exploit would be propogated without it being visible in the login.c source code. However, gcc is also open source, so somebody would notice this meta-exploit hidden in it. The solution to this is a meta-meta-exploit - a version of gcc that inserts the malicious code when compiling gcc.

      parse_code();
      if (filename == "gcc.c)
        insert_meta_exploit();
      if (filename == "login.c")
        insert_exploit();
      generate_assembly();

This exploit can now be removed from the source. When a single machine is infected, the malicious machine code would gradually spread to other machines, without it ever being visible to developers. The code could be further modified to insert exploits into gdb and decompilers to hide the malicious machine code from anybody trying to find it. The exploit now becomes invisible to any analysis that takes place on the machine itself, or any other infected machine - somebody would need a third-party tool, bootstrapped without ever using gcc, to have any chance of finding and fixing this exploit.

What is the answer to this dilema? The answer is that there is no answer. There are various heuristic methods that could be used - for instance, actually using separate compilers, and hoping that a substantial amount have not been exploited. Ultimately, for the end user, it is necessary to simply trust part of the machine on faith. This region of trust must be minimized and verified - an ideal solution would involve only a few core binaries.