CS111 Winter 2013 Lecture 17: Introduction to Security

Scribe notes prepared by Gary Chao and Bryant Khau

Network File System Security

When talking about NFS security, a typical setup is an NFS client and NFS server that communicate to each other over a network. There are a few general methods in which a malicious client can attack a server.

Man-in-the-Middle Attacks

A Man-in-the-Middle (MITM) attack is when network traffic between two computers, call them A and B, that normally travels directly each other and instead reroutes traffic of both A and B to the attacker. In general, a MITM attack does not have to be between a client and server and can happen between any two computers. At this point, the attacker has a few options which he/she could do with this power:

Silently snoop on the messages sent between A and B
Trick A into thinking it's communicating with B and vice versa by modifying the messages within believable parameters and sending them along their way

Man-in-the-Middle Attack

Image Source

MITM uses a technique called ARP (Address Resolution Protocol) spoofing in order to execute the attack. ARP spoofing works by faking the ARP table of a network's router into associating the attacker's MAC address (a network card hardware identifier unique to each card) with an IP address in the router's local area network. For example, if an attacker associates his MAC address with the local IP address 192.168.1.10, then any message destined for that IP address is rerouted to the attacker.

MITM attacks often occur in public locations with unsecured network communications. Some examples would be libraries, cafes, and other public locales with free Wi-Fi.

Some ways to prevent MITM attacks:

Use SSL to mask web-based network traffic
Encrypt the packets that are sent to and from computer A and B
Use private networks

Using private networks is the most popular choice today, mainly because this was the first solution for protection and because it's cheaper to implement.

Attacker Can Pretend to be a Valid Client

Assuming there's no snooping going on, an attacker can pretend to be a valid client by spoofing their user idea. Many NFS servers identify clients through their user ids and not their usernames. So even if an attacker's username is different, if the IDs match then the server could mistake the attacker for someone else. For example, the school chancellor's ID could be 1009 but the attacker could also have the same ID can gain access that way. Some solutions to this:

The server should not automatically trust the client
The server needs some sort of authentication system to verify the user
Use passwords

Attacker Can Pretend to be the Server

Finally, an attacker can pretend to be a server. A clean user would communicate with the attacker, and the attacker can sniff the information the user sends in order to gain valuable data like username, password, etc. In addition, the attacker could also send malicious data back to the user.

Security

In the real world, security is used mainly to combat:

Force
Fraud
Snooping/Theft

Likewise, typical security in cyberspace mainly involve protecting:

Privacy (unauthorized release of information)
Integrity (tampering of data)
Service (denial of resources and computer services)

General Goals of Security Services

Deny unauthorized access
Allow authorized access
Operate efficiently enough even with Denial of Service (DoS) attacks

How hard is it to test these goals?

Allowing authorized access - easy to test
Denying unauthorized access - hard to test
Testing protection against DoS - doable

The takeaway is that positive testing (what you want to allow) is much easier than negative testing (what you don't want to allow).

Early Stages of Secure System Design

In order to design a security system, you must model the system that you are trying to protect and what all the components are that need to be accounted for:

Assets - the files involved in the system
Actors - all the people involved in the system, i.e. userse
Threats - possible malicious attacks in all of its forms and methods

Of course, threats are the hardest to identify.

Threat Modeling and Classification

What are some of the most common threats that can be identified off the bat?

Insiders - actors within a system with malicious intent, like an employee of the company you're trying to build the security system for
Social engineering - an outsider pretends to be an insider. For example, someone who claims to be a network repairman and compromises the system physically in order to easily gain access from a remote location
Network attacks:
- Worms, viruses
- Drive-by-download - about 10% of websites are affected by these. An attacker which gains access to a website's file system can place deceptive buttons, advertisements, or other effects that if a user clicks on them will automatically install a malicious program onto the user's computer. The program then spreads itself to other websites by self-cloning and can be difficult to detect. Some ways to protect against this is to simply disable JavaScript or use an web browser extension like NoScript which is essentially a whitelist of websites that JavaScript is allowed to run on
- Denial of Service (DoS), Distributed Denial of Service (DDoS) - an attempt to make a computer or network resource unavailable to its intended users. This can be carried out by a singular agent (DoS) or multiple agents (DDoS) on one resource in order to try to hog up that resource's computation time or congest it
Device attacks, such as USB flash drive viruses that are physically spread when inserted into a computer

General Functions Needed to Defend Against Attacks

Authentication - you are who you say you are
Integrity - data should not be tampered with. One way of dealing with this is by keeping checksums of data and periodically checking them against data, restoring if there's an unauthorized change and mismatch detected
Authorization - what is each user's specific rights? This assumes that the user can be authenticated in the first place
Auditing - Keep a log of all actions that occur on a system. This is not so much preventation as it is diagnosis of an attack that has already occurred in order to make improvements to the security system to protect against future attacks

There are some constraints that need to be taken into consideration when designing a security system:

Efficiency - Depending on the system and what it's used for, what kind of tolerance do you have on how much the security affects performance? For example, is 30% extra processing time or 300% extra processing time okay? It all depends on what level of security you need and what kind of data is being handled, so make the appropriate tradeoff accordingly
Correctness - Not only a security goal, but a general OS goal. It needs to never make non-malicious actions be blocked. Use methods like abstraction and modularity to achieve this goal.

Authorization Technologies

There are two major ways to access resources:

Direct: Access by mapping directly into the address space and use ordinary loads and stores. Each access is checked by ahrdware rules. When you get a fault, the fault handler (which is typically under software control) can do much more fancy things to deal with the faults. For example, the function

	char *p = mmap(fd, offset, etc...)

creates a pointer directly into virtual memory.

Indirect: Service requests via handles. Handles are opaque objects. Each request is handled by software, and you cannot access files directly. For example,

	fd = open (file, ...)
	read (fd, ...)

This accesses the file through its file descriptor, not the file itself, a method of abstraction.

Access Control Lists (ACLs)

In a fairly straightforward way to control permissions, we can store permissions by using a large 3D boolean array with the users, files, and permissions in the x, y, and z axes, respectively. The 3D boolean array would work like this:

	for each user
		for each file
			for each type of access permissions
				allowed (boolean true) or disallowed (boolean false)?

3D Boolean Array

Though this would work, it would take a giant array in memory and it's just not very efficient. An access control list compacts this cube in this way:

	for each file
		list of user permissions

This essentially attaches a list of permissions to each file. For example, a file called "foo" could have something a permission like (Bob, read) which would give permission for the user Bob to read the file, but nothing else.

A Different Authorization Model

ACLs are an example of an effective authorization model, but it is not the only one. We want the model to fit with the needs of the system. Some needs could be capturing rights accurately or letting ordinary users specify rights. ACLs in general are not flexible enough for large organizations. A better model would be role-based access control - though this model is more complex but more secure than ACLs.

Role-based access control (RBAC) has to keep distinction between users and their roles. For example, the user "eggert" could have multiple roles assigned to him:

eggert as cs-faculty
eggert as sys-admin

Each role would constrain eggert to specific tasks. As a cs-faculty, he could change grades. As a sys-admin, he could make changes to other users' permissions. But these permissions are usually mutually exclusive. There could sub-roles. For example, a role could be "faculty" and sub-roles could be "cs-faculty" and "english-faculty". cs-faculty and english-faculty could have some mutual permissions because they both belong to the faculty role but could also have mutually exclusive permissions that apply only to their sub-roles.

Two Technologies for Authorization

ACL and RBAC: each file lists permissions for each user

Capabilities: Each descriptor lists permissions! Multiple descriptors could point to a file, but each of these descriptors would have their own separate permissions. Example:

	fd = open ("abc", O_RDONLY)            <--- code
	r--	        rwx                    <--- permissions
	read-only	read, write, execute   <--- description of permissions

This means that the file "abc" has the permissions rwx attached to it, but opening the file and obtainng the file descriptor fd means that any action done with fd can only read the file it points to and nothing else.

What if instead of O_RDONLY, the argument was O_PATH? O_PATH essentially gives no permissions, it only shows the existence of a file that fd points to. What can you do with this fd? You could duplicate the file descriptor using dup or dup2, but then you'd have 2 useless fd's instead of 1. What would be useful is to run:

	fstat (fd, dst)

fstat would be able to return information about the file that fd points to, and that would definitely be useful in some cases.

Capability-only System?

This is possible. Capabilities are essentially inverted file permissions. Some things you can do with capabilities:

Map file name --> capability
Clone capabilities
Take permissions away
Give permissions across network so a random user can use fd with exactly the same access as fd intended

You could forge capabilities, but this could be solved using encryption.

Some Details About ACLs

When you type "ls -l" in the UNIX command line, you can see your permissions for the files in your current directory. We could see something like this:

	-rwxr-x-r-x

So far this seems normal each character in this permission is essentially a bit that represents a permission. If that bit is 0, it is represented by a '-'. If it is 1, then the character representing what it stands for appears, such as 'r' or 'x'. However, some permissions can look like this:

	-rwsr-xr-x

Wait, that 's' appeared where 'x' used to be! Weren't there only two modes in that slot - whether or not to give permission to execute? It turns out there are an extra 3 hidden bits to the left of the ACL - these 3 bits enable/disable "setuid", "setgid", and "sticky":

setuid permission - when enabled, a process that runs this file is granted access based on the owner of the file, not you. The owner can possibly be root
setgid permission - when enabled, it sets the process's group ID to the group owner of the file
sticky bit - when enabled, this protects the file within a directory by disallowing the deletion of the file by any user except the owner of the file, owner of the directory, or root

setuid in particular can be dangerous if the file's owner is root because then they can do anything. setuid is not all dangerous and has its uses. For example, it allows a user to change his/her own password so root doesn't have to log in everytime a user wants to change passwords.

setuid does not blindly run everything as root if the owner of a file is root. Some dangerous uses have been specifically programmed against. For example say a user does this in the command line to the passwd program:

	$cat > passwd.c
	$cc passwd.c passwd
	$chmod root passwd
	$chmod 4755 passwd

These commands tries to create a copy of password and run it as root. However, this fails because this in fact does not run as root and there are special security measures against it.

Trusted Computing Base

There are certain processes and programs that a computer system must be able to run with no hinderance or can misbehave without affecting security. This set of software and hardware is called the trusted computing base. It must resist tampering and be as small as possible so it's manageable and reduces the appearance of security vulnerabilities. Some examples of trusted code:

/bin/passwd
/bin/login
Device drivers sitting inside the kernel
The kernel itself (for obvious reasons)

A Problem in Establishing the Trusted Computing Base

In the paper "Reflections on Trusting Trust" by Ken Thompson, one of the original developers of UNIX at Bell Labs, details a major flaw in breaking into the trusted code. Let's say he wanted to have some special code that would allow a very specific username to be able to login to any machine by modifying "login.c" and distributing it as an update:

	if (strcmp(user, "ken") == 0) {
		uid = 0;
		login();
	}

As we can see, this would allow anyone who knew this secret username "ken" to login to any UNIX computer and act as the root. But UNIX is open source, so surely someone would catch it! Ken Thompson took it one step further.

He proposed that he could hypothetically modify the C compiler itself to specifically generate the strcmp function if compiling the file "login.c". So now "login.c" looks safe to anyone who reviews it. But now, someone could look at the C compiler source code and discover that trickery as well. Ken Thompson could cover this up too by programming the compiler for the compiler, so that (if compiling the C compiler), generate the code that would modify the C compiler to generate the code that would modify "login.c". Now we run into trouble, and there is no way to verify that this has not been done already in any of the operating systems in use today including Ubuntu, Redhat, Windows 8, etc. In a sense, we have to not only trust the trusted computing base, but also trust that the developers were honest to not conceal something like that if we wish to use their operating system.