Security and Cloud Computing
John Gilik
Aditya Kaicker
Michael Schneider
In the 1960s, we had mainframes. They had a bus, to which a ton of I/O devices were attached, as well as one or two cards with processors attached. These systems were very data-intensive, and data optimization was the primary task of the day, with reliability being the big secondary focus. Nobody really worried all that much about processing, about algorithms, or similar problems outside of input and output: mainframes were all about accessing the data, and accessing it now.
In the late 1990s, the next step in high-end computing came along. It was the concept of clusters. They would save administrators money by taking some machine, finding a bunch of similar machines, tossing them all onto a fast network and allowing them all to be used for any task that the administrator required. This provided cheap parallel processing, and also made it possible to get at data (processed or otherwise) quickly due to the fast LAN. Once again, a fast bus was the center of the system, only it went by the name of local area networking.
Now, in the 2000s, we're seeing a new concept showing up: cloud computing. The major difference between cloud computing and clustering is ownership and physical proximity (or lack thereof). In clustering, a single administrator sets up a single physically contiguous cluster of machines. In cloud computing, many administrators get together and set up many physically contiguous clusters of machines linked by the Internet. Clouds can be thought of as clusters of clusters, and this brings in a cluster of new issues—mostly political. If we have a massive network consisting of the union of a cluster at UCLA, a cluster at UC Irvine, a cluster at Berkeley, and a cluster at MIT, then who owns that cloud? Who manages it? Who controls it? Who pays for it? With more than one stakeholder involved, political discussions become inevitable—but not without reward.
Clouds have one major advantage over clusters: they are a short-term commitment. To implement a cluster, I would have to go out and buy a whole bunch of mostly-similar machines. These machines depreciate incredibly fast, so I would be out a whole bunch of cash. If I don't use those machines for more than a short period of time, then I've wasted my investment by failing to get a lot of mileage out of it. Clouds, on the other hand, are a pay-as-you-go model and can grow as needed. If we pay for processing time on a cloud instead of a cluster of machines, we don't have to worry about the long-term nearly as much.
For example, if I run the Internet broadcast of the Super Bowl, then I need a lot of capacity for one or two days, and very little capacity for the other 363 days in the year. If I buy a cluster to meet those capacity needs, then I've massively over-provisioned for at least 363 days. If I instead buy two days of service from a high-end cloud, then my goals are met for much cheaper, and I don't have to worry about having wasted processing power, as someone else will be around to use that power the next day. As a result, more capacity is available for everyone overall—it's a beautiful sort of capitalistic computing communism.
Not everything about clouds is rainbows and sunshine, though. Cloud providers are inevitably making a profit, since they wouldn't do it otherwise. As a result, the overall price of computing time technically rises. Privacy is not guaranteed, making cloud computing a suboptimal solution for companies handling sensitive data, or for individuals who tend towards a healthy dose of paranoia. (Considering how often IT employees claim that they're ready to steal data for a profit if they feel their job security is already tenuous, this may turn out to be more of an issue than any of us currently expect.) Additionally, bugs are harder to debug in a networked environment, resulting in programs that programmers don't want to touch. This inevitably leads to developmental stagnation, as an established cloud application becomes “that thing that you don't want to touch, lest you anger the bug gods”.
Additionally, risk is consolidated in the cloud. Now, consolidation of risk is a good thing in that it's often possible to massively reduce the percentage chance of failure. Now, reduced percentage chance of failure is a good thing, but what about the consequences of failure? A failure in the cloud doesn't result in the downtime of a single website, or a single service, or anything like that—nay, it could be downright catastrophic. What would happen if every single online store, every single news outlet, and every single search engine were consolidated in one cloud... and a router in that cloud started propagating bunk routing tables through BGP? Until an administrator steps in and takes the day or two it would take to resolve the entire mess, the entire cloud of apps would be offline, and every business that relies on any of those apps would be frozen. A failure in the cloud has a smaller percentage chance of occurring, but it also has a much higher chance of being dangerously catastrophic.
Vendor lock-in is a major issue—without standards for cloud programming, programmers are nervous to start using one cloud provider due to a fear of being locked into that provider's programming standards.
Software licensing becomes insane—what happens with commercial licenses if you need to pay $500 a seat? Does that mean that every VM spawned in the cloud costs you another $500? What about GPL software—if you never publish it, you never have to distribute the source. Does using your software with a cloud provider count as publishing it? If not, then developers can freely use GPL apps in the cloud, making modifications and optimizations to them while violating the GPL's spirit of sharing. If it does count as publishing, then big corporations relying on the publishing clause to be able to use GPL programs optimized with trade secrets won't be able to use their existing apps in the cloud. With either scenario, someone loses.
We've now sufficiently covered the pros and cons of cloud computing—let's move on to computer security.
We left off in the previous lecture with authentication. We discussed internal and external authentication, but never dealt with what it's typically used for: authorization. We will now cover authorization.
We were previously considering a gigantic array of bits determining authorization, with resources, principals, and access types as the three keys for the three dimensions of the array. This array would be exceedingly large, and incredibly difficult for a human to manage. Instead, we should consider various methods in which we can represent this array through a series of easy-to-read and easy-to-understand rules.
UNIX has a very simple model for representing authorization: every file has an owner ID and a group ID. Any given principal is either the owner, in the file's group, or neither of these. Each of these groups of principals (owner, group, and other) have a set of bits to specify the access types they are allowed to use: read, write, and execute. For example, if a file a exists, owned by user jbruin, and the owner has the read and execute bits set, then the user jbruin can read or execute the file, but cannot write to it. This system is simple, but can often be too simple. For instance, only root can create groups, so it becomes difficult for a regular user to declare “I'd like for users a, b, and c to be able to access this directory I just created”. This results in all-too-many tutorials on Apache and PHP saying “just chmod it 777”, because this model for authorization doesn't scale well to many users.
ACLs come next—the owner of a resource can specify an access list consisting of principals and their allowed accesses. It gets bulky if the owner has to set each access manually for every file after creation, but runs on the assumption that the defaults inherited from the parent directory are “good enough” 99% of the time. Thus, inheritance of permissions and intelligent guessing of desired permissions winds up becoming the main pain with ACLs.
One problem with both ACLs and traditional UNIX octet permissions is that they're both based solely on the user ID. If I'm a hacker, then once I'm executing with Dr. Eggert's user ID, I'm set: I can do anything. Role-Based Access Control gets around this issue by creating a set of roles that each user can assume. Each role has a different set of authorizations, and for a program to assume a greater role requires the OS to query the user. Thus, if my program executing with the “read my CS111 notes” role is compromised, the hacker can't really do much more than read my CS111 notes, since I'd be queried if he attempted to escalate his roles. This system winds up being overly-complex for most real-live production systems, and winds up going unimplemented.
Two primary mechanisms exist for enforcing access control: ACLs and their kin, and “capabilities”. Capabilities inverts the traditional arrangement of storing permissions with the resources, and instead stores the permissions with the principal. Each principal winds up storing a set of hashed pointers that the OS can correlate to an actual data pointer and a set of rights to it. The hashed pointer is your permission to access a given file—without knowing the file's name and a proper hashed pointer to it, you cannot access it.
For a networked file system, ACLs have the following pros and cons:
Pro: one only needs to encrypt the authentication routine (scales with O(1))
Con: need to synchronize user IDs across systems
Con: need to create new users for foreign network clients to access your system
For a networked file system, capabilities have the following pros and cons:
Con: one needs to encrypt every hashed pointer (scales with O(n), where n is the number of accesses)
Pro: operates at a lower level than user IDs, meaning synchronization is no issue
Pro: users from foreign networks can freely use our systems, as long as they somehow are notified of a few file access hashes they can use
On software trust:
In general, operating systems don't trust applications, since they operate in userspace and thus represent principals. The operating system doesn't trust principals, since if every principal were 100% trustworthy, we wouldn't need prisons. Since applications run on behalf of principals, and the operating system doesn't trust principals, the operating system shouldn't trust applications.
Some programs are trusted, though, such as login, which accepts a username and a password, and forks into a shell with your user ID if you entered your username and password correctly. It uses a call named setuid() to assume your user ID after forking. If every application could call setuid(), then every application could use any user's permissions, and this is clearly insecure. Thus, setuid() can only be called by certain trusted programs (programs running as root), such as login. Generally, we want to trust as few programs as possible, and to keep them as small and simple as possible, since a compromise of a trusted program would mean that an attacker could use trusted function calls such as setuid(), compromising the security of all principals on the system.
How do we choose to trust login? Our Linux distributors tell us to trust a file named login with a given cryptographic hash. Why do we trust our distributors? Because we assume that they do a thorough code inspection before compiling the code, grabbing the checksums, shipping the compiled code, and then telling us that we should trust the binaries within with those trusted checksums.
Is this a secure basis for trust then? Not necessarily. Read “Reflections on Trusting Trust” if you ever get a chance. It was written by Ken Thompson, and is blissfully brief. In short, he describes a method for creating a compiler that recognizes its own purported source code, and inserts a small trojan horse when compiled. This trojan horse replicates the behavior of recognizing its own source code and inserting said trojan horse when compiled. It also recognizes the source code to login, and inserts a trojan horse that allows anyone to login to a user named ken without a password, with that username yielding full root privileges. As a result, the source code to both the compiler and login can appear to be completely innocuous, yet when compiled (with the malicious compiler), both innocuous sets of source code produce malicious binaries. Moral of the story? We establish a chain of trust which is only as good as its weakest link. Trusting OS distributors mean that we trust the people they got their software from, and it's completely plausible that a security hole could be present in their software which could not be detected through the most common method (auditing source code). To fully trust a system, you need to have built it from ground-up yourself. (I hope you enjoy wave-soldering circuit boards and assembling systems that produce silicon wafers from scratch!)
Happy hacking!