CS 111
Scribe Notes for 12/3/09
by Ryan Harris, Mike Hess
Cloud Computing
Mainframes - 1960's
- Data-intensive
- Had problems with data optimization - couldn't figure out how to stream data to CPU at right time
- Reliability
- IBM, Fujitsu
Clusters - 1990's
A cluster is basically a large amount of Linux boxes linked together by an IP network. Each box is it's
own computer (could be thought of as a mini-mainframe). Also the machines don't need to be identical,
therefore they are heterogeneous.
- Beowulf
- SGE (Sun Grid Engine)
Clouds
Clouds can be though of as "clusters of clusters." Since there are already prexisting clusters around
the country, they can just be linked toghether through networks, therefore the clusters are phyiscally
separated from each other. This present some problems, like ownership, because different people already
own the individual clusters.
- Not just one owner, user, or organization
- Primary obstacle to clouds are political issues
- Who controls the cloud? (since clusters are spread around)
- Who pays?
- Political issues get merged into techinical issues
- Security
- Resource management
- Amazon EC2, Globus
Advantages over clusters and grids
- Short-term commitment
- Buy computing power of someone else's cloud
- Don't need capital investment
- Pay as needed
- No predicting what resources you will need
- Can grow quickly - fast scaling
Disadvantages
- Price to make cloud
- It all depends
- If you know how much computing you need and its relatively stable, buy a cluster because
its cheaper
- Run the numbers vs. clusters
- Privacy
- Data confidentiality
- Encrypt data to and from the cloud
- Must trust whoever runs the cloud - could be a bugged virtual CPU so they can see code
- Network latency - don't want to run real-time applications on clouds
- Data transfer bottlenecks
- BIG unsolved problem
- Archive data?
- "sneakernet" - style technology
- Bugs
- Hard ones that show up as you scale
- No easy solution - unsolved (if solving cheaply)
- Other security issues
- Denial of service attack
- Physical attacks
- Overload risks
- Everyone's needs could exceeed the cloud's capacity
- Multiple suppliers
- Societal risk
- Overload of data acces is often biggest problem - scalable storage
Vendor Lock-In
- Could get stuck to the vendor who owns the cluster, like Microsoft
Software Licensing
- Can't license tons of Windows copies for millions of machines - too expensive!
- Big-bucks problem (licensing formulas)
- Proprietary
- Free software - problem: you take Linux, run it in cloud, don't distribute it though
Security Again
We need some sort of access control, so we prohibit "bad" accesses and allow "good" accesses.
However, we want to accurately be able to tell which are "good" and which are "bad". We don't
want to incorrectly label an access and deny "good" ones and accept "bad" ones. Traditional Unix
had permissions, with a user, group, and other part (rwxrwxrwx). In original unix the user belonged
to 1 group, but in BSD the user can belong to multiple groups.
ACLs - Access Control Lists
- Owner of a resource can specify an access list (list of principals & their accesses)
- Key idea - make sure default ACLs are right when a resource is created
Role-Based Access Control (RBAC)
- ACLs etc.: each resource has an ACL, etc. attached to it - all accesses are mediated by the OS
- Capabilities: each principal has a "RCL" (set of ccap
Trusted Software
From an OS viewpoint, OSes don't trust applications, because they don't trust users, and applications run
on behave of users. However there are some trusted applications, and login is one of them. Login uses a
syscall of setuid(id) so you can change who the user is.
- Which programs do we trust? - as few and as small of list as possible.
- How can we trust login? - Cryptographic checksum of program
- How does vendor trust login?
- Reflections on Trusting Trust - K. Thompson
- Thompson explains how login cannot be trusted and proves it.
- He says he can just change Linux to produce bugged code so he can log in on any Linux system
- Trusted COmputing Base
|