Scribe Notes: CS 111 - Lecture 1 (Winter 2015)

By Kelly Hosokawa, Ky-Cuong Huynh, Gautam Gupta, and Dylan Flanders

Magical Grep

Sample output:

$ ls -l big 
-rw-rw-r-- 1 eggert faculty 9223372036854775000 Oct 6 11:41 big
$

~10¹⁹ bytes in size

Looking for char 'x' in the file:

$ grep x big
$

No matches. Timing the grep:

$ time grep x big 
real 0m009s

Just 9 milliseconds! For a file that large, this is unreasonable. 10¹⁹ bytes / 10^-2 seconds => 10²¹ bytes/second => 8 * 10²¹ bits/second => 8 Zb/second (zettabits/second)

xkcd What If #31 asks: What is the entire bandwidth of the Internet? The current estimate is 167 Tb/second (terabits/second). So how we can move data faster?

Here, at UCLA, we have that problem too. Solution: Use the SneakerNet and move the hard disks physically. Theoretically, if we use all of the US freight industry (UPS, USPS, FedEx, etc.) to move a bunch of MicroSD cards, then we can move 0.5 Zb/s. How was grep 16 times faster? It cheated.

The file wasn't actually that big; we pretended to make it that big and were able to fool all of the standard Unix utilities. Grep was smarter. It's really saying, "in all of the places you cheated, there are no x's."

Say we did:

$ echo x >> big

Then:

$ grep x big    
Binary file 'big' matches

Course Administration

Course Website: http://www.cs.ucla.edu/classes/winter15/cs111/

Prerequisites:

CS 32 - Introduction to Computer Science II
CS 33 - Introduction to Computer Organization
CS 35L - Software Construction Laboratory

Also recommended:

CS 118 - Computer Network Fundamentals
CS 131 - Programming Languages
CS M151B - Computer Systems Architecture

	Theory	Reality
Lecture	4 Hours	3.7 Hours
Lab	2 Hours	1.7 Hours
Outside Study	9 Hours	20 Hours

Assignments

4 Labs (can be done with a partner)
2 Mini-labs
Design Problem - includes a report and presentation
Report - 2 or 3 pages about real OS issues
Scribe Notes (can be done with a partner) - make sure to validate with http://validator.w3.org/ using HTML 5 or 4.01

Item	Fraction of Grade	Notes
Final	2/9	180 minutes. Open notes, no electronic devices.
Midterm	1/9	100 minutes in class. Open notes, no electronic devices.
Labs (4)	1/3	Teams of 1 or 2
Mini-labs (2)	2/15	Solo, no partners. "WeensyOS" labs
Design Problem	1/12	Done in teams of 1 or 2. Work to extend part of a lab.
Research Paper	1/15	2-3 pages summarizing recent work in an area of operating systems.
Scribe Notes	1/20	Done in teams of up to 4. Due on CCLE one week after the lecture.

Lateness penalty: 1% penalty for (0, 1) days late, 2% penalty for (1, 2) days late, 4% penalty for (2, 3) days late, etc.

Drop-dead date: March 13th (Friday of 10th week). No assignments will accepted after this day.

Textbook: Principles of Computer System Design by Saltzer & Kaashoek

Operating Systems

Articles Related to Operating Systems:

"Red Star 3.0: North Korea launches its own OS"; Sky News (UK) 2014-12-31

"Crouton for Chromebooks: Run Ubuntu in a browser tab"; lilputing 2014-12-2
(Doesn't work with a diff. GNU/Linux distro. though; find out why!)

"Samsung says new (2015) TVs will run Tizen [formerly LiMo]"; CNET 2015-01-05

Quotes on Operating Systems:

"We don't offer a ready-made programme, but an entire operating system."
Marina Weisband, leader of German Pirate Party. The Economist 2013-01-15, p. 19

Operating systems are connected to politics. This was undreamed of before.

Definition: System

But what is a system?

We can turn to an Oxford English Dictionary from 1928 to find a traditional, pre-computer definition:
I. An organized or connected group of objects.
II. A set of principles, etc., a scheme, method.

Just as interesting is the etymology:
"System" comes from the Greek σύστημα, (transliterated as "sýstima"), meaning "organized whole, government, constitution."

The idea of a system is thus ancient. Even the roots of sýstima mean "set up with." Marina Weisband in the above quote is appealing to this ancient meaning.

Let's see another definition, this one from the course textbook (§1.A.2):
"A set of interconnected components that has a specified behavior observed at the interface with its environment."

Let's move to the modern era by examining the definition for "operating system" specifically. From Bill Gates' personal encyclopedia, Encarta (2007):
"Master control program in a computer."

By this definition, the entity in control is the one that matters, a definition fit for Microsoft.

Yet another definition comes from the American Heritage dictionary (the most conservative dictionary, in its 4th edition from 2000):
"Software designed to control the hardware of a specific data-processing system in order to allow users and application programs to make use of it."

This is a very traditional definition. It is also obsolete. Operating systems are no longer necessarily hardware-specific. Even so, users and applications still matter.

Finally, from Wikipedia (page id: 640779030, 2015-01-03):
"Software that manages computer hardware and software resources and provides common services for computer programs."

Note how "control" has become "manage" and the appearance of "resources." Resource-management will be of great importance. As for "common services," it comes back to how we implement that system's interface, e.g. any program can ask the OS to translate UI text to Spanish for it. We want to avoid duplication. The overall goal here is clarity, with as few lines of code as possible.

The guiding question:

"How can we make our systems obvious and clear?"

Problems in System Design

Common Systems Problems:

1. Incommensurate scaling

This problem is well known in physical engineering designs. Example: a 12-foot tall human would need a modified skeleton; something like stretching in all 3 directions would not work. We'd be 8 times as heavy and 2 times as tall, but our bone strength grows as the square of their cross-section.

We are hitting diseconomies of scale, where things get more expensive per unit. Example: an 8-port gigabit router is $100, but a 48-port gigabit router is thousands of dollars. This is because it's a star network where the number of possible combinations of connections grows as the square.

Contrast this with economies of scale, as seen in the pin factory from Adam Smith's Wealth of Nations. There, we saw how it was more efficient to have someone specialize, invest capital for equipment, and make many pins rather than have everyone else produce their own. We need to account for both of these phenomena when we scale our systems. Otherwise, they cause breakage or waste (e.g., more pins produced than needed).

2. Emergent properties

These are properties that only become apparent qualitatively. "Things just happen" as the system gets bigger, and we didn't predict them. We're looking for things that have no effect or may not even exist at a small scale but that will break our system at large-enough scales. For example: 20 years ago, UCLA got the latest and fastest Internet connections. The intended effect: Homework would get done faster. Instead, students heavily pirated music, this being the age of Napster. UCLA was swamped with cease-and-desist letters. At one point, more than 80% of the network traffic was pirated materials.

Another example: the Tacoma Narrows bridge. Engineers forgot about resonant frequencies, in this case from the wind. The bridge soon collapsed when strong winds swayed it at just the right frequency. Every civil engineer gets an earful of this now.

3. Propagation of effects

Operating systems are digital, not analog. The smallest thing can make an operating system crash. Often, we think we have some well-designed OS with boundaries between components. Effects may still cross boundaries and wreck things.

Example: A file system takes requests like a:\b\c and gives you the file:

$ cat a:\b\c

Somewhere else, we have an encoding for Japanese characters, which cannot fit in one byte (more than 256 Japanese characters exist).

So, we decide on a two-byte encoding scheme. When the text is ASCII, the top first bit will be 0, and we'll leave the second byte unused. Japanese will have the top bit be 1, and the character may use both bytes. This gives us 2¹⁵ combinations, enough to handle Japanese.

Say we tried to create a new directory with a Japanese name:

$ cp a:\b\c d:[Japanese]

But the user gets an error message along the lines of:

"cannot create directory [Japanese]"

What happened? The top bit of the second byte had a 0, and the rest of it looked like a backslash. This happened with an early version of Windows. To fix this, Microsoft had to break down the boundary between the encoding and the file system and make things more complicated.

4. Trade-offs

We face all the previous problems, and we start trying to solve them, only to find that that we make other problems worse. This can be called the "waterbed effect."

Example: sorting algorithms. Bubble sort is faster than heap sort or mergesort for small-enough datasets, but doesn't scale the way they do.

5. Complexity

This is the problem special to CS.

Consider Moore's Law: Complexity doubles every 18-24 months (for the cheapest design point; it doesn’t necessarily hold for more expensive CPUs). Originally, the law Gordon Moore formulated referred to the number of transistors on a chip. Professor Eggert thinks Moore's Law is petering out, but he said that 10 years ago too.

The main concern: computers are getting more complicated, but we designers are not becoming much more sophisticated.

Consider also Kryder's Law: Disk drive capacity also grows exponentially. Looking back, Professor Eggert's first home computer had a 1 GB hard drive in 1992 for $1,000.

Looking forward, we need to continue to build systems that work, and that we understand despite their immense complexity. It will not be an easy task, but it will be a worthwhile one.