Sample output:
$ ls -l big -rw-rw-r-- 1 eggert faculty 9223372036854775000 Oct 6 11:41 big $
~1019 bytes in size
Looking for char 'x' in the file:
$ grep x big $
No matches. Timing the grep:
$ time grep x big real 0m009s
Just 9 milliseconds! For a file that large, this is unreasonable. 1019 bytes / 10-2 seconds => 1021 bytes/second => 8 * 1021 bits/second => 8 Zb/second (zettabits/second)
xkcd What If #31 asks: What is the entire bandwidth of the Internet? The current estimate is 167 Tb/second (terabits/second). So how we can move data faster?
Here, at UCLA, we have that problem too. Solution: Use the SneakerNet and move the hard disks physically. Theoretically, if we use all of the US freight industry (UPS, USPS, FedEx, etc.) to move a bunch of MicroSD cards, then we can move 0.5 Zb/s. How was grep 16 times faster? It cheated.
The file wasn't actually that big; we pretended to make it that big and were able to fool all of the standard Unix utilities. Grep was smarter. It's really saying, "in all of the places you cheated, there are no x's."
Say we did:
$ echo x >> big
Then:
$ grep x big Binary file 'big' matches
Course Website: http://www.cs.ucla.edu/classes/winter15/cs111/
Prerequisites:
Also recommended:
Theory | Reality | |
---|---|---|
Lecture | 4 Hours | 3.7 Hours |
Lab | 2 Hours | 1.7 Hours |
Outside Study | 9 Hours | 20 Hours |
Item | Fraction of Grade | Notes |
---|---|---|
Final | 2/9 | 180 minutes. Open notes, no electronic devices. |
Midterm | 1/9 | 100 minutes in class. Open notes, no electronic devices. |
Labs (4) | 1/3 | Teams of 1 or 2 |
Mini-labs (2) | 2/15 | Solo, no partners. "WeensyOS" labs |
Design Problem | 1/12 | Done in teams of 1 or 2. Work to extend part of a lab. |
Research Paper | 1/15 | 2-3 pages summarizing recent work in an area of operating systems. |
Scribe Notes | 1/20 | Done in teams of up to 4. Due on CCLE one week after the lecture. |
Lateness penalty: 1% penalty for (0, 1) days late, 2% penalty for (1, 2) days late, 4% penalty for (2, 3) days late, etc.
Drop-dead date: March 13th (Friday of 10th week). No assignments will accepted after this day.
Textbook: Principles of Computer System Design by Saltzer & Kaashoek
"Red Star 3.0: North Korea launches its own OS"; Sky News (UK) 2014-12-31
"Crouton for Chromebooks: Run Ubuntu in a browser tab";
lilputing 2014-12-2
(Doesn't work with a diff. GNU/Linux distro. though;
find out why!)
"Samsung says new (2015) TVs will run Tizen [formerly LiMo]"; CNET 2015-01-05
"We don't offer a ready-made programme, but an entire operating system."
Operating systems are connected to politics. This was undreamed of before.
But what is a system?
We can turn to an Oxford English Dictionary from 1928 to find
a traditional, pre-computer definition:
I. An organized or connected group of objects.
II. A set of principles, etc., a scheme, method.
Just as interesting is the etymology:
"System" comes from the Greek
σύστημα, (transliterated as "sýstima"), meaning "organized whole,
government, constitution."
The idea of a system is thus ancient. Even the roots of sýstima mean "set up with." Marina Weisband in the above quote is appealing to this ancient meaning.
Let's see another definition, this one from the course
textbook (§1.A.2):
"A set of interconnected components that has a
specified behavior observed at the interface with its environment."
Let's move to the modern era by examining the definition
for "operating system" specifically. From Bill Gates' personal
encyclopedia, Encarta (2007):
"Master control program in a computer."
By this definition, the entity in control is the one that matters, a definition fit for Microsoft.
Yet another definition comes from the American Heritage dictionary
(the most conservative dictionary, in its 4th edition from 2000):
"Software designed to control the hardware of a specific
data-processing system in order to allow users and application
programs to make use of it."
This is a very traditional definition. It is also obsolete. Operating systems are no longer necessarily hardware-specific. Even so, users and applications still matter.
Finally, from Wikipedia (page id: 640779030, 2015-01-03):
"Software that manages computer hardware and software
resources and provides common services for computer programs."
Note how "control" has become "manage" and the appearance of "resources." Resource-management will be of great importance. As for "common services," it comes back to how we implement that system's interface, e.g. any program can ask the OS to translate UI text to Spanish for it. We want to avoid duplication. The overall goal here is clarity, with as few lines of code as possible.
The guiding question:"How can we make our systems obvious and clear?"
Common Systems Problems:
This problem is well known in physical engineering designs. Example: a 12-foot tall human would need a modified skeleton; something like stretching in all 3 directions would not work. We'd be 8 times as heavy and 2 times as tall, but our bone strength grows as the square of their cross-section.
We are hitting diseconomies of scale, where things get more expensive per unit. Example: an 8-port gigabit router is $100, but a 48-port gigabit router is thousands of dollars. This is because it's a star network where the number of possible combinations of connections grows as the square.
Contrast this with economies of scale, as seen in the pin factory from Adam Smith's Wealth of Nations. There, we saw how it was more efficient to have someone specialize, invest capital for equipment, and make many pins rather than have everyone else produce their own. We need to account for both of these phenomena when we scale our systems. Otherwise, they cause breakage or waste (e.g., more pins produced than needed).
These are properties that only become apparent qualitatively. "Things just happen" as the system gets bigger, and we didn't predict them. We're looking for things that have no effect or may not even exist at a small scale but that will break our system at large-enough scales. For example: 20 years ago, UCLA got the latest and fastest Internet connections. The intended effect: Homework would get done faster. Instead, students heavily pirated music, this being the age of Napster. UCLA was swamped with cease-and-desist letters. At one point, more than 80% of the network traffic was pirated materials.
Another example: the Tacoma Narrows bridge. Engineers forgot about resonant frequencies, in this case from the wind. The bridge soon collapsed when strong winds swayed it at just the right frequency. Every civil engineer gets an earful of this now.
Operating systems are digital, not analog. The smallest thing can make an operating system crash. Often, we think we have some well-designed OS with boundaries between components. Effects may still cross boundaries and wreck things.
Example: A file system takes requests like a:\b\c
and gives
you the file:
$ cat a:\b\c
Somewhere else, we have an encoding for Japanese characters, which cannot fit in one byte (more than 256 Japanese characters exist).
So, we decide on a two-byte encoding scheme. When the text is ASCII, the top first bit will be 0, and we'll leave the second byte unused. Japanese will have the top bit be 1, and the character may use both bytes. This gives us 215 combinations, enough to handle Japanese.
Say we tried to create a new directory with a Japanese name:
$ cp a:\b\c d:[Japanese]
But the user gets an error message along the lines of:
"cannot create directory [Japanese]"
What happened? The top bit of the second byte had a 0, and the rest of it looked like a backslash. This happened with an early version of Windows. To fix this, Microsoft had to break down the boundary between the encoding and the file system and make things more complicated.
We face all the previous problems, and we start trying to solve them, only to find that that we make other problems worse. This can be called the "waterbed effect."
Example: sorting algorithms. Bubble sort is faster than heap sort or mergesort for small-enough datasets, but doesn't scale the way they do.
This is the problem special to CS.
Consider Moore's Law: Complexity doubles every 18-24 months (for the cheapest design point; it doesn’t necessarily hold for more expensive CPUs). Originally, the law Gordon Moore formulated referred to the number of transistors on a chip. Professor Eggert thinks Moore's Law is petering out, but he said that 10 years ago too.
The main concern: computers are getting more complicated, but we designers are not becoming much more sophisticated.
Consider also Kryder's Law: Disk drive capacity also grows exponentially. Looking back, Professor Eggert's first home computer had a 1 GB hard drive in 1992 for $1,000.
Looking forward, we need to continue to build systems that work, and that we understand despite their immense complexity. It will not be an easy task, but it will be a worthwhile one.