CS 111

Scribe Notes for 12/4/13

by Victor Sia and Gary Chang

Security (continued)

Cryptographic checksums

Now we have SHA1, SHA2, all the way until SHA512 (512 bit checksum)

Authentication:

Now, suppose you want to authenticate over a network:
    -We must assume that there may be snooping, or man in the middle (packet injection) attacks
We have Eggert's encrypted password on the server. How can the client authenticate itself to the server using a cryptographic checksum?
    -Possible Solutions:
        -Compute the cryptographic checksum and send it to the server
            -In this case, the attacker can just snoop the checksum and use it to authenticate itself. This does not work, so we need new tools.
        -Private key cryptosystems:
            -Secret key "K" known to both sender A and recipient B, and message M, encoded message E
            -Crypt function, crypt(K, M) -> E
            -Decrypt function, decrypt(K, E) -> M
        With this approach, it is impossible to find any of the other things from just one piece of the puzzle. Having the encrypted message, you cannot decrypt it. Having a message to send, you cannot encrypt it.

        Problem: How do we set up this secret key, K, and get it to both client and server?
        Standard technique to handle this problem is the public key system
            -Every key comes in pairs: one is public, Pub, and one is private, Pri
            -You can send Pub to outside sources, and they can use encrypt(Pub, M) -> E
            -To use decrypt, you need Pri. decrypt(Pri, E) -> M

        Problem: You can still be the victim of a replay attack. An attacker can grab E and resend it.
        Last piece of the puzzle: Nonce
            A Nonce is a random bit string that you need a reliable way to get.
            You need to set up a protocol between A and B such that:
            A generates a nonce, nonce(a)
            A->B:    crypt(Pub(b), nonce(a) . "M" -> E(a), where "." means concatenate
            B->A:    crypt(Pub(a), nonce(a) . "new Message" . nonce(b)) -> E(b), now B and A have recognized each other, and can send a session key
            A->B:    crypt(Pub(b), nonce(b) . "Key(a)"
        -The nonce will change every session because it is randomly generated.
        -Suppose an attacker intercepts E(b). They will not be able to continue the "conversation" because they do not have nonce(b).

    This is the underlying concept of ssh. After each N amount of time passed/data sent, both parties change up the private key.
        -This solves the problem where either they key sneaks out, or an attacker tries a replay attack. If you change up the private key periodically, you make these attacks much less likely.

Example packet you can send using this protocol: Message Authentication Code, MAC
HMAC: assumes shared key K (private key system)
    SHA1((K^pad1) . SHA1((K^pad2) . M)), where "^" is exclusive or
    pad1, pad2 are constants specified by the algorithm
    At the end of this, you have a checksum that you can append to the message.
    The recipient can recompute this checksum to make sure that it matches.
Why is this method so complicated? Why can't we just have:
    SHA1(M)
        -Anyone can compute this since SHA1 is a known algorithm.
    SHA1(K^M)
        -This would work appropriately if SHA1 is 100% reliable and uncrackable. As it is, SHA1 is not strong enough. If K is not a strong key, then patterns can be recognized and we can crack this message.
    SHA1(K^M).M
        -Anyone can read this message. We have authentication, but no privacy.
    encrypt(Pub(b), SHA1(K^M).M
        -This is a good, working method.
You have a number of options for privacy and authentication. You don't want to use any more encrpytion than you need.

Food for thought: What % of power in this country is used for encryption?

Authorization:

Assume that you are who you say you are (authentication is successful)
What should you be authorized to do?
Access Control:
    -Figure out all the different ways you can access objects in the system
    -Figure out all the people who can access this system
    -List what accesses are allowed
    Simple version:
        -Subjects: people, users, independent agents allowed to run around and do stuff
        -Objects: files, data, pieces of data that you are trying to control access to
        -Access Types/Operations: things that subjects want to do to objects: read, write, execute, delete, etc...

        We have 3 types of things, so we can make a 3d space: (Objects are columns, Subjects are rows, Read/Write/Delete is the 3rd dimension in this table)

Objects VS Subjects:	Grades	Salary	Medical Records
Eggert	YYY	YNN	NNN
Smallburg	YNN	YNN	NNN
Block	YYY	YYN	NNN

        When the system expands to proportions of, say, Facebook, the objects and subjects dimensions get huge.
        The problem is not just the storage to hold the table, but the amount of management needed to make sure each of those entries are exactly right.
        So a 3d array does not scale well.
    Want: a way of specifying access control without a 3d array
    How does Unix do this?
        Objects: files
        Subjects: users
        Access control: only 3 bits deep, doesn't support arbitrary bit patterns
        Unix only supports a subset of these bit patterns, so you need to encode w hat you want in terms of file ownership, what groups the files are in.
            This method takes advantage of common patterns
            For example, we can create a file where all users can access it (a common access pattern)
            We can also create a file where only one user can access it (another common access pattern)
        Is this system good enough? This is generally good enough. For example, SEASnet:
        SEASnet starts the quarter with high security, then the TAs slowly relax security enough so that classes can work on their projects
            Some things are still annoying to do on this system:
            Suppose there is a group of people in the same user group (all students). We want to share a project between these students.
            So, the project will be visible only to these several students, and no one else.
            Doing this is possible if you have root access. Root can create this user group, but regular users can not.
            If you can pull enough support from the operations staff, you can pull this off. However, this cannot be scaled well.
        Advantage of the Unix system:
            Fixed size, 12 bits. What are these bits?
            Sticky bit, setGID, setUID, then RWD for root, users, admins.
            What is setUID used for?
                Suppose we have a program with these properties: 100111101101 (rwsr-xr-x) Block foo
                Eggert runs this program foo:
                    foo then runs with Block's privileges.
                    e.g. this program could read # unread emails, so Block can create this program, then give it to the secretary. The secretary can then run this program and give him the info.
            /bin/su, /bin/login have setUID bit set, and run as root.
                There is a limited amount of code to execute, and this code is audited and "safe"
            setUID: run as user
            setGID: run as group
            Sticky bit: originally a performance thing. It says to keep the program in main memory.
        sudo is basically an admission of failure that the system is not doing what you want it to do, so you need a way to work around its limitations.

    Need a better scaling approach: Access control list model
        Assume that attached to every object in the system is an ACL
        An ACL is a list of (subject, access) pairs
        The owner of the file can change the ACL (helpful to have a reasonable default, like umask)
        As long as thigns are kept relatively small, ACL can be efficient and is used in many UNIX systems
        getfacl (get file access control list)
        setfacl so you can tune the facl for the system you want
        Problems:
            -Space concern: ACL can be unbounded in size
            -knowledge of what the users are: users' names, etc. are public
            -Tends to give too much power to a user that has authenticated itself

        New approach concerned with user power and security: Role-Based Access (used for online transactions)
        Users assume roles, and these roles act as subjects, not the users themselves.
        e.g.
            Eggert assumes role of instructor, and thus can do instructor-privileged things
            Eggert then asks OS for permission to change roles to withholding paystuff
            Eggert can no longer do instructor things, can only do withholding paystuff things
        This type of system is becoming more popular in mobile devices
            For example, many apps ask for permission to look at physical location

New concept: capabilities
Capabilities are like ACLS, except that while ACLs are attached to objects, capabilities are attached to processes
Capability Definition: an unforgeable token representing an object, followed by access rights
        Only the person controlling the capability can use it.
        There are no subjects, only objects and rights.
        The subject is implicit: the person using it is the subject
    e.g. file descriptors in UNIX
        Process 57, fd 17
        dup2(17,30) copies the capability 17 to fd 30
    Can you send a fd to another process? In some distributions of UNIX, yes.
        Why? If you're performing a services for someone.
        If you are a web server, and you get a request, you can send the open socket to another process to do work.

ACLs and capabilities are, to some extent, duals of each other
OSs using only ACLs suffer from some performance issues
OSs using only capabilities tend to be more complicated

Cloud Computing

    e.g. Amazon Web Services, Microsoft Azure, Google App Engine, OpenStack

    These cloud computing services work on several levels
    Top level: browsers
    Some other levels...
    APPS level in the cloud
    Compute engine, storage engine, etc. shared services that apps can depend on

If you are building a system based on this, how can you get it to run securely?
Security issues:
    Browser authenticates to apps (can solve this problem using key systems)
2013 survey:
    18% are using hosted private clouds
    8% hybrid approach
    19% public cloud (such as Amazon, Azure)
    55% on-premises private clouds (because they work in locked rooms, this reduces security concerns)
In practice, security concerns (and to some extent, costs) reduce the amount of public cloud based systems we see
    -Have to worry about compute and storage engines. If the attacker gains control of some of these storage engines that contain your work, or even if the owners themselves attack you, then they can see (and maybe tamper with) all your stored data.
        -You can encrypt your stored data and network data, but this is expensive
        -You can mirror/back up your data, but this is costly
Non-security problems:
    -Risk of overload: your cloud supplier happens to be busy servicing someone else
        -You can, in theory, use multiple suppliers, but generally this is not used since this should be THEIR problem, not yours
    -DDoS attack on another service on the same server will effect you
    -Data access overload
    -Scalable storage access
    -Lock-in: if you are tired of using Amazon and want to switch to Google, it is very hard to switch because they use different APIS
    Bugs and debugging: many bugs appear when you scale up, and can be very difficult to solve

Back to Security:
    Suppose you log into a virtual machine to do work.
    How can you trust your virtual machine? Is there some way to inspect the virtual machine and see if it has been tampered with?
        Reflections on Trusting Trust - K. Thompson
        How to break into a Unix system without changing any of the source code
        Suppose we want to inject code into the login program, to make it always log in user "Ken" as root.
        To do this, we have to tamper with login.c
        If we modify login.c and send it to all UNIX users, you will be able to break into all of these systems.
        However, other users will see this change and get suspicious. So, we can mask this change by modifying gcc instead.
        MODIFIED GCC.c:
         if(strcomp(name, "login.c" == 0) {generate code for bad login.c}
        This makes it so you still have the unmodified login.c, and instead ship this version of gcc which will compile your modified login program.
        However, we can still see this suspicious change.
        We mask this gcc.c code, again, in gcc.c:
        if(strcomp(name, "gcc.c" == 0) { generate code for bad gcc.c (from before)}
        You can compile this bad code once and get a malicious gcc. If you ship out only this gcc executable and not the source code, then everyone running this will keep compiling bad gcc's, and bad logins.
        This is all done without shipping out bad login.c and bad gcc.c
        This way, you can create a perfectly working system with an undetected loophole.
        It is impossible to detect this loophole from within this system, you need help from outside in order to solve this problem.