CS 111 Lecture 2 Scribe Notes (Winter 2012)

Prepared by Nate Emerson, Jason Cox, Brian Aller, and Eric Chiang for a lecture given by UCLA's Paul Eggert on January 9, 2012


Table of Contents

  1. Class Logistics
    1. Requirements/Details
    2. Grading
    3. Late Policy
    4. Exams
  2. Systems
  3. Problems with Computer Systems
    1. Tradeoffs
    2. Propagation of Effects
    3. Emergent Properties as we scale (qualitative problem)
    4. Incommensurate Scaling (quantitative problem)
    5. Complexity

Class Logistics

Requirements/Details

Official Requirements: Official Prerequisites:
REAL Requirements: REAL Prerequisites:

The overarching goal of this course is to build components that are integral in/part of operating systems.

Course Website: http://cs.ucla.edu/classes/winter12/cs111

The book is from MIT, and as such is based on a more conceptual and abstracted idea of Computer Systems in general. It is detached from the more concrete topics e.g. a shell. Comprehensive schedule is on website, but schedule is subject to change (in particular after Lab 1 is done)

Grading

Late Policy

Assignments turned in late lose one letter grade per late day/partial day. There are three free late days allotted across the quarter that can be used as needed. No submissions are taken after Friday of 10th week.

Exams

Midterm is in class on February 13. Final is on Wednesday, March 21 from 8:00 AM - 11:00 AM, Location TBA. Both exams are open notes and open books, but closed computer.

Systems - What is a system?

Original definition from OED original 1928

1. An organized or connected group of objects

A set of principles, etc.; a scheme or method

You methodically come up with a set of principles to design a system, not build in ad-hoc way.

Definition from Ancient Greek

συστημα (root "set up with") – organized whole, government, constitution, a body of men or animals, musical interval, a stanza

What is an operating system?

From American Heritage dictionary, 4th edition (2000)

1. Software designed to control the hardware of a specific data processing system in order to allow users and application programs to make use of it.

Some interesting points here. The users making use of seems implicit, and also why is control so important? Control was not mentioned in the original definition from 1928 or the greek definition. Also of note is the phrase "specific data processing system." Based on this definition, Linux is NOT an operating system, because it is written to run on a variety of systems, not one specific system.

"Bill Gates' Dictionary" (Encarta 2007)

Master control program in a computer...

The attitude that operating systems are focused on control is now outdated. Most of the problems within designing an operating system are not primarily concerned with control, although the issues of control must be solved regardless.

Wikipedia Revision # 469680935 (20012-01-05)

A set of programs that manage computer hardware resources and provide common services for application software.

Language is softened from control to manage. Modern computers are not associated with users at all. There are now more computer than people, so the users of operating systems are more often than not inanimate. Also, there is a new key phrase here of common services. More "polite" feel, "may I be of service to you" sounds nicer than "DO THIS."

Definition from Textbook Section 1.A.2

A system is a set of interconnected components that has a specified behavior observed at the interface with its environment.

We have now substituted for "users" the more general notion of "environment." What we care about here is the behavior observed at the interface.

Interface

What do you want from your OS?

We have too many goals here, and we cannot possibly satisfy them all in an optimal way. Thus arises the issue of tradeoffs.

Problems with Computer Systems

  1. Tradeoffs
  2. Propagation of Effects
  3. Emergent Properties as we scale (qualitative problem)
  4. Incommensurate Scaling (quantitative problem)
  5. Complexity

Tradeoffs

"Waterbed Effect" – when pushing down on (improving) one area of a system results in issues surfacing in other areas.

Example: time-space tradeoff in sorting

Given a 10GB array of 1kB entries to be sorted using heapsort. What is a simple way to slightly speed up the process (will still be O(N log N)) using more memory? We can create a set of pointers to the data, and sort the pointers instead of sorting the data itself. Since it’s 10GB in memory, it must be a 64 bit system, so pointers are 8B, and thus 5120 additional bytes must be utilized for this sort.

Security

What is the most common way that SEASnet is broken into? Bots brute forcing/guessing passwords. This can be combated with hardware tokens (key fobs that generate passwords and are reverse engineerable). Downsides of this include monetary investment to purchase tokens, as well as the fact that if you lose the token you cannot login and someone else may be able to. Professor Eggert's somewhat serious proposed solution, of only allowing access to either UCLA IP addresses, or off-campus users with SSH keys gotten on campus, involves tradeoffs due to SSH being viewed as too complex for many users, and difficulty for users who never come to campus.

Propagation of Effects (Butterfly Effect)

Making any change can cause unanticipated issues elsewhere. This naturally occurs in chaotic systems, but also occurs in designed systems.

Microsoft encountered this problem when they took their operating system to Eastern countries, and found that one-byte characters were not enough to accommodate all symbols in the Japanese alphabet. Now the idea is to use 2 byte characters, where the first bit signals whether or not it is a multi-byte character, and there remain 15 bits with which to encode the character. Another feature is hierarchical file names (ex: c:\foo\bar). These two features are incompatible.

The problem here is that if the second byte of a multi-byte character is the code for a slash, the operating system will not let you name a file with this character because it will think you are trying to move it to a directory that may or may not exist. One possible solution would be to move to a different character encoding scheme, in particular UTF-8. However, Microsoft's decision was to change their system such that it would recognize whether the user was writing in Japanese or English.

Another Problem:

Let’s say you want to make backups of a disk partition you are managing for someone else. You are working within a container based virtual system (see diagram below).

Virtual Filesystem

Here users are prevented from accessing anything outside their virtualized filesystem. A backup utility is run as root that copies files from all filesystems to a backup device using tar. Tar runs stat on the file, and then if the file is ok it opens it. The exploit here was a timing exploit, that if the user swapped out a file f for a symbolic link to some other area on the host filesystem AFTER the stat was run, this file in another user’s area would be backed up instead of the safe file within the attacker’s area that stat was run on. One solution here is to open and then fstat instead of stat then open.

Emergent properties with scaling

Computer Worms

The problem of computer worms did not exist in the early stages of the internet, because everybody knew each other on the network by first name, and anybody to introduce a worm would be fired naturally. There are now simply too many people to know on the internet because it has grown to be such a massive network. This problem is exacerbated by the prolific use of big software, which inevitably contain exploitable bugs. Can you find the new security hole?

Dorm networks were another example of this, as they were well intended, but resulted in rampant piracy after deployment, with piracy supposedly peaking at 90% of all traffic.

Incommensurate Scaling

Not everything scales at the same rate. Bears get to be about 3m long maximum, and not many are found larger than this. This is due to the fact that the structure of a bear’s anatomy is dependent on the cross section of the supporting bones (surface area) but the weight supported by the structure is dependent on volume. Since surface area grows with the second power of the linear size and volume grows with the third power, this model cannot be scaled indefinitely.

Economies of scale: the benefit grows greater than the cost. Ex: Pin factory of Adam Smith. It was found that pins are much cheaper to produce lots at a time than to produce one at a time. Thus, it is more profitable for one person to create a pin factory that specializes in making and selling pins, than for every person in a village to learn how to make pins and make their own pins independently whenever needed.

The opposite of this is diseconomies of scale, where the costs grow faster than the benefits. Ex: star network. In a typical network, there is one hub or switch where all users are connected. For a small amount of users, this works, but is extremely inefficient when scaling past 5 or 10 users on one switch/hub.

In economies of scale, as the size grows, there may be greater waste encountered. As the pin factory scales, you may end up with lots of pins that you don’t need. In diseconomies of scale, as size grows, there are breakdowns.

Complexity

Complexity is one of the largest problems encountered in computer system design. Blame on why it’s such a large problem is often placed on hardware designers by software engineers, and vice versa. Due to Moore’s law, the number of transistors on a chip has scaled exponentially, and the result is incredibly complex chips that the operating systems strive to utilize efficiently. A corollary of Moore’s Law is Kryder’s Law, which states that the memory of a magnetic disk drive of fixed size will double approximately every two years. However, even though increased complexity in hardware is readily noticeable, complexity with software is also on the rise. Software engineers sometimes have trouble designing simple systems because there are no natural boundaries to halt complexity. This leads to overly complex solutions to systems that could have otherwise been designed in a simple manner.