CS 111, Operating Systems:
“Computer Security” (Lecture 18)

(notes compiled by Faisal Alquaddoomi)

Background: Computer Security in the World at Large

The talk begins with a humorous discussion of the “Supreme Council of Virtual Space”, an organization in the Iranian government which is apparently charged with the management of “virtual reality” (that is, online censorship and cyber-security.) The organization launched an offensive against the BBC on March 1st, 2012, consisting of a denial-of-service attack against their website as well as the jamming of two satellites relaying radio broadcasts. China and the United States routinely have run-ins with each other in “virtual reality”, where entire divisions of the government are devoted to waging cyber-warfare (often labeled “cyber-terrorism” by the offended party.)

The gist of this all is that computer security-related conflicts between nations are becoming increasingly commonplace, and thus computer security is gaining increasing importance. Insecurity is a prevalent theme, but the defenders are mostly ignored (incidentally to their preference, as their work is best done in secret.)

Defensive Techniques: Kerchoffs’s Principle

Firstly, defensive techniques outnumber offensive ones, as one must anticipate every conceivable attack. The first of these defensive techniques was invented in the early 19th century (1883, to be precise) by Kerchoffs (not to be confused with Kerchoff). Kerchoffs created six principles, one of which is of primary interest today:

Design versus Key
  1. The system must be practically, if not mathematically, indecipherable;
  2. It must not be required to be secret, and it must be able to fall into the hands of the enemy without inconvenience;
  3. Its key must be communicable and retainable without the help of written notes, and changeable or modifiable at the will of the correspondents;
  4. It must be applicable to telegraphic correspondence;
  5. It must be portable, and its usage and function must not require the concourse of several people;
  6. Finally, it is necessary, given the circumstances that command its application, that the system be easy to use, requiring neither mental strain nor the knowledge of a long series of rules to observe.

The Converse: “Security Through Obscurity”

The emboldened rule above is considered the opposite of “security through obscurity”; it implies that the vast majority of the design of a system should be assumed to be public knowledge, with only a tiny secret (the “key”) retained that alone ensures the security constraints of the system.

The rule is controversial, because it’s quite tempting to make everything secret. It’s easy to claim, when a project begins, that the complexity of keeping every piece of it under wraps is manageable – after all, closed-source commercial software works under such a model. The problem with this approach is that the larger an obscure system is that depends on its obscurity, the more prone it is to a single leak that compromises the entire thing.

For instance, DVDs are entirely encrypted and rely on the obscurity of the mechanism to ensure their ability to be read only by authorized devices. There’s reason to believe that an insider leaked information about the algorithm to a Norwegian hacker; nowadays, all of the media that was secured via this single obscure mechanism is now breached, and playing back DVDs on all manner of devices is commonplace.

Applying Kerchoffs’s Principle to UNIX

A file called /etc/passwd stores the username, home directory, and password for each user. This of course makes this file a very sensitive target, but insecure info is mixed in with the secure information. As a first pass, let’s store a checksum (also called a “hash”) of the password instead of the password itself. A hashing function (denoted as ‘H’ below) is a “one-way” function that can be used for verifying that some given input is the same as the original data (within a margin of error), but doesn’t reveal the original data.

H(password) = S

Note that knowing 'S' gives you little information about 'password'. Unfortunately, hashing functions aren’t as expensive to compute as they used to be due to improvements in computational ability; it’s possible to find “rainbow tables” that map hashes back to their original passwords. In order to combat this, let’s customize the hashing function by adding “salt”, a secret string that’s known only to the server, which is concatenated to any input (both the original data and to the data that’s provided later for comparison to the original) prior to producing a hash.

UNIX adopted this salting scheme, but put the salt at the end of the hashed password in /etc/passwd for convenience. This was a bit of a compromise, as the salt should be secret. In modern versions of ‘NIX, /etc/passwd no longer holds any version of the password (instead there’s a blank space.) The real passwords are stored in a ‘shadow’ file which requires elevated permissions to access, finally solving the problem of mixing secure and insecure data in the same location.

Authentication as a Defensive Technique

“Authentication” (exemplified above in the use of a password) is a defensive technique intended to prevent the “Masquerade” attack, in which the attacker attempts to gain access to protected resources by pretending that they’re a valid user. The technique relies on trusted login agents, components of the system that have special access to credentials as well as the ability to confer rights once their requirements for determining the incoming user’s identity have been satisfied.

Consider a few attacks against password-based systems:

With these disadvantages in mind, we look for other ways to ascertain identity aside from passwords:

All of the above are “external authentication” techniques, meaning that they mediate the passage of an unauthenticated user into an authenticated domain via some special knowledge or quality of the user. This is in contrast to “internal authentication”, in which the user’s established identity within the system is used to authenticate them to additional resources – think of it like a visitor’s pass that you’re issued when you first enter a secure place, which you can then use to look around.

Remote Logins

On a single machine, we have some level of trust in the security of the mechanism that’s authenticating us (although this can be subverted by hardware/software tools such as keyloggers.) On a network, we have to assume that the entire path of communication is compromised – attackers can view, alter, reorder, and delete any messages that move through the network.

Say that we have the following interaction between client A and server B, where A is attempting to prove to B that they are who they claim to be:

Plaintext Password Exchange

In the above example, authentication is attempted by transmitting the password in plaintext across the network. The attacker can clearly see the password and replicate the client’s login, store the password for use later, or otherwise mess with the login attempt without either side knowing what’s going on.

In order to foil this, we could encrypt the authentication attempt from A to B. Assume that we have a key ‘k’ that’s previously been distributed to both parties, then we can represent this standard symmetric key situation like so:

{M}k + k → M

M + k → {M}k

Where the top row represents decrypting a message encrypted with ‘k’ using ‘k’ to produce the plaintext message ‘M’, and the second row represnts the opposite – that is, using ‘M’ and ‘k’ to produce the ‘k’-encrypted message.

Replay Attacks and Nonces

This does indeed protect the key from becoming known, but since the attacker has full control of the network, they can intercept the login message in its entirety and resend it whenever they want to gain access in place of the original user. This is known as a replay attack; it doesn’t require any understanding of the message, just the ability to “play it back” later.

Now, in order to foil the replay attack, A and B can agree to add a special bit of data, called a nonce, to the payload. The nonce uniquely identifies the request, and is usually generated on the server and sent encrypted to the client, which then includes the nonce in its own response. While it seems like a simple solution, generating an unguessable nonce is actually quite difficult; there are vulnerabilities in many pseudorandom number generators that allow the sequence of numbers to be guessed or influenced, compromosing the unknowabiltiy of the nonce. Many operating systems, for instance, use the low-order bits of the clock measured after some “random event” has occurred, such as a mouse or keyboard event, or other “driver noise”; these events occasionally can be guessed or coerced to occur on target systems. (There was  a suggestion to distribute nonces via GPS, but, considering the scope of the undertaking, the project was practically dead in the water before it began.)

Public Key Encryption

Let’s employ public key encryption instead. We’ll assume the existence of a private key that each party holds secret and uses to ‘sign’ their messages, and a public key that can be used to decrypt any message encrypted by a particular sender’s private key. Conversely, a message encrypted using a party’s public key can only be decrypted by applying the party’s private key to it. Thus, communication can occur privately in both directions, so long as all parties are aware of the public keys. Applied to our previous example:

Encrypted Password Exchange

The above allows B to establish A’s identity; with the optional third part, A and B can converse securely from that point onward without having to perform the entire security handshake for each request.

Message Tampering Prevention

The above will work well, but it’s not particularly resilient to attempts to tamper with the messages. A common implementation of an anti-tampering scheme exists, known as HMAC (“hash message authentication code”.) We again assume a message ‘M’ and a previously-distributed private key ‘k’, and assume the existence of two ‘pads’ (values that are known to both sides); the scheme works like so (‘~’ is the bitwise XOR operation):

SHA1( (k ~ pad1) ~ SHA1( k ~ pad2 ) ~ M ) = T

‘T’ is the method authentication code; we can pass the message through the same process and compare it to ‘T’ to verify its integrity. SHA1() is a hashing function; as previously discussed, it transforms its input into a module ‘hash space’ with good distribution across the space for small changes in the input.

Why are there multiple calls to SHA1? Unfortunately, SHA1 is not that good of a hash; if you know SHA1(x), SHA1(x ^ y) is fairly easy to guess, especially if y is small. The concept with HMAC is essentially to randomize SHA1’s input space to work around its limitations.

More Remote Security Protocols

Two samples: SSH and IPSEC. SSH (“secure shell”) is the standard means to gain secure remote console access to a UNIX machine, whereas IPSEC is an “enterprise” answer to the problem of connecting separated local networks over an insecure medium (say, the internet.)

SSH is managed with the following control files:

Important SSH Files

IPSEC, on the other hand, is a much bigger deal. It’s intended for corporate environments that consist of a number of geographically disparate local networks on which “gullible” applications are run. IPSEC insulates these applications from the specifics of the secure transport by “tunneling” communication securely across insecure intermediate networks. This is a rough diagram of an IPSEC system:

IPSEC Diagram

Authorization and Access Control

The question of authorization is simply this: “who has the rights to do what with what?” More concretely, authorization dictates the specific actions that specific users can take on particular objects that comprise a system. One can visualize this relationship as a 3-dimensional space of Boolean values, where a given point is a tuple consisting of (user, file, action) => 1 if permission is granted, 0 otherwise.

Authorization Matrix

One can see that this structure can become extremely ungainly as the number of users, files, and actions grow. Our priority will be on making this structure easily understandable, with efficiency being a secondary goal, but important goal.

There are two approaches to managing this structure: access control lists (ACLs) and capabilities.

Access Control Lists

Access control lists are the traditional UNIX approach to authorization, although both approaches are possible in modern versions of UNIX. Under the ACL approach, each file has a list attached to it that describes who can do what with the file. For example:

$ ls –l /etc/passwd
-rw-r--r-- root root

The first ‘root’ is the group to which the file belongs, and the second is the user. According to these permissions, the user ‘root’ has read-write access, the group ‘root’ has only read access, and everyone else has read access as well. Under UNIX, only the owner can change the permission bits, most commonly through the ‘chmod’ command.

Files cannot be given away in modern variants of UNIX, due to a problem discovered when the Berkeley Systems folks implemented quotas (limits on disk usage, often per-user) into the operating system: users who were short on quota could simply reassign their large files to someone else, pushing off their quota problem to the unfortunate victim. Traditional UNIX didn’t have quotas, so this wasn’t as much of a problem (although there was still a question of tracking accountability for a file that allowing for changing ownership didn’t address.)

Capabilities

Each user (or “principal”, a representation of the user) has a list of files and permissions they have over each file associated with them. In Linux, this capability list manifests as the file descriptors to which a running process currently has access (where the process substitutes for the user.)

Common Properties of Both Approaches

Both approaches must be unforgeable, otherwise processes and users can gain access to privileges that they shouldn’t be able to. (In extreme cases, this can lead the “privilege escalation”, in which a user successively gains access to higher and higher levels of authorization until they are effectively root. This is often accomplished by compromising a privileged process, then using its capabilities to attack other targets.)

Both ACLs and capability lists must be consulted before actions are taken. This somewhat implies OS/HW support, since we need an “intercessionary authority” to screen the more fundamental requests.

Implementation Details: Direct vs. Indirect

Either of the above techniques can be implemented in either a direct or indirect manner.

In the direct case, privileged memory is mapped directly into the process space, allowing the process to have efficient (but unfiltered) control of the data. This is most often used where speed is of primary concern; for instance, graphics adapters often make their RAM available to a process in this manner after the process has acquired the device. Unfortunately, there’s no way to selectively screen certain operations – there’s no way to block them from plotting green pixels, say.

Contrasting this is the indirect approach, in which each request to control the data must be approved by the authorization mechanism. This is clearly less efficient, but is much more flexible; revocation of rights takes effect immediately, whereas the resource must be released and reacquired for changes in the rights to take place in the direct case. Fine-grained access policies can be established, too, since each request is serviced independently and can be rejected or modified based on its specifics.

Implementation Details: Static vs. Extensible Permission Lists

Windows/VMS-style (and modern variants of Linux) ACLs extend UNIX permissions significantly, allowing for sets of users to be associated with specific permissions beyond the simple user/group/all assignments in traditional UNIX. This has advantages in terms of flexibility and the ease of managing large sets of users, but poses problems for the structure of the filesystem. Since permissions were not extensible in traditional UNIX, they could be placed in the statically-sized inode structure; if they’re extensible, they have to be able to grow, and thus must be placed in the data section of the disk. Permissions-related operations must be atomic; thus, this imposes additional atomicity constraints on the OS, causing these operations to become less efficient.

Problem: Too Many Rights!

It can occur that particular users end up being able to do too many things; their account is overloaded with responsibility, and presents a security risk when working in domains where the additional permissions are unrelated. The answer to this problem is the concept of a role; essentially, a role comprises the permissions that a user has for a given task, and can be assumed independently of the entire account’s capabilities. The approach is termed role-based access control (RBAC), and complicates the previous 3D tuple space by adding a fourth axis, the role for which the (user, file, property) tuple applies. This is also an interesting example the balance between power and security, where power is willingly given up in order to make the system more secure.