The overarching goal of this course is to build components that are integral in/part of operating systems.
Course Website: http://cs.ucla.edu/classes/winter12/cs111
The book is from MIT, and as such is based on a more conceptual and abstracted idea of Computer Systems in general. It is detached from the more concrete topics e.g. a shell. Comprehensive schedule is on website, but schedule is subject to change (in particular after Lab 1 is done)
Assignments turned in late lose one letter grade per late day/partial day. There are three free late days allotted across the quarter that can be used as needed. No submissions are taken after Friday of 10th week.
Midterm is in class on February 13. Final is on Wednesday, March 21 from 8:00 AM - 11:00 AM, Location TBA. Both exams are open notes and open books, but closed computer.
Original definition from OED original 1928
1. An organized or connected group of objects
A set of principles, etc.; a scheme or method
You methodically come up with a set of principles to design a system, not build in ad-hoc way.
Definition from Ancient Greek
συστημα (root "set up with") – organized whole, government, constitution, a body of men or animals, musical interval, a stanza
From American Heritage dictionary, 4th edition (2000)
1. Software designed to control the hardware of a specific data processing system in order to allow users and application programs to make use of it.
Some interesting points here. The users making use of seems implicit, and also why is control so important? Control was not mentioned in the original definition from 1928 or the greek definition. Also of note is the phrase "specific data processing system." Based on this definition, Linux is NOT an operating system, because it is written to run on a variety of systems, not one specific system.
"Bill Gates' Dictionary" (Encarta 2007)
Master control program in a computer...
The attitude that operating systems are focused on control is now outdated. Most of the problems within designing an operating system are not primarily concerned with control, although the issues of control must be solved regardless.
Wikipedia Revision # 469680935 (20012-01-05)
A set of programs that manage computer hardware resources and provide common services for application software.
Language is softened from control to manage. Modern computers are not associated with users at all. There are now more computer than people, so the users of operating systems are more often than not inanimate. Also, there is a new key phrase here of common services. More "polite" feel, "may I be of service to you" sounds nicer than "DO THIS."
Definition from Textbook Section 1.A.2
A system is a set of interconnected components that has a specified behavior observed at the interface with its environment.
We have now substituted for "users" the more general notion of "environment." What we care about here is the behavior observed at the interface.
What do you want from your OS?
We have too many goals here, and we cannot possibly satisfy them all in an optimal way. Thus arises the issue of tradeoffs.
"Waterbed Effect" – when pushing down on (improving) one area of a system results in issues surfacing in other areas.
Example: time-space tradeoff in sorting
Given a 10GB array of 1kB entries to be sorted using heapsort. What is a simple way to slightly speed up the process (will still be O(N log N)) using more memory? We can create a set of pointers to the data, and sort the pointers instead of sorting the data itself. Since it’s 10GB in memory, it must be a 64 bit system, so pointers are 8B, and thus 5120 additional bytes must be utilized for this sort.
Security
What is the most common way that SEASnet is broken into? Bots brute forcing/guessing passwords. This can be combated with hardware tokens (key fobs that generate passwords and are reverse engineerable). Downsides of this include monetary investment to purchase tokens, as well as the fact that if you lose the token you cannot login and someone else may be able to. Professor Eggert's somewhat serious proposed solution, of only allowing access to either UCLA IP addresses, or off-campus users with SSH keys gotten on campus, involves tradeoffs due to SSH being viewed as too complex for many users, and difficulty for users who never come to campus.
Making any change can cause unanticipated issues elsewhere. This naturally occurs in chaotic systems, but also occurs in designed systems.
Microsoft encountered this problem when they took their operating system to Eastern countries, and found that one-byte characters were not enough to accommodate all symbols in the Japanese alphabet. Now the idea is to use 2 byte characters, where the first bit signals whether or not it is a multi-byte character, and there remain 15 bits with which to encode the character. Another feature is hierarchical file names (ex: c:\foo\bar). These two features are incompatible.
The problem here is that if the second byte of a multi-byte character is the code for a slash, the operating system will not let you name a file with this character because it will think you are trying to move it to a directory that may or may not exist. One possible solution would be to move to a different character encoding scheme, in particular UTF-8. However, Microsoft's decision was to change their system such that it would recognize whether the user was writing in Japanese or English.
Another Problem:
Let’s say you want to make backups of a disk partition you are managing for someone else. You are working within a container based virtual system (see diagram below).
Here users are prevented from accessing anything outside their virtualized filesystem. A backup utility is run as root that copies files from all filesystems to a backup device using tar. Tar runs stat on the file, and then if the file is ok it opens it. The exploit here was a timing exploit, that if the user swapped out a file f for a symbolic link to some other area on the host filesystem AFTER the stat was run, this file in another user’s area would be backed up instead of the safe file within the attacker’s area that stat was run on. One solution here is to open and then fstat instead of stat then open.
Computer Worms
The problem of computer worms did not exist in the early stages of the internet, because everybody knew each other on the network by first name, and anybody to introduce a worm would be fired naturally. There are now simply too many people to know on the internet because it has grown to be such a massive network. This problem is exacerbated by the prolific use of big software, which inevitably contain exploitable bugs. Can you find the new security hole?
Dorm networks were another example of this, as they were well intended, but resulted in rampant piracy after deployment, with piracy supposedly peaking at 90% of all traffic.
Not everything scales at the same rate. Bears get to be about 3m long maximum, and not many are found larger than this. This is due to the fact that the structure of a bear’s anatomy is dependent on the cross section of the supporting bones (surface area) but the weight supported by the structure is dependent on volume. Since surface area grows with the second power of the linear size and volume grows with the third power, this model cannot be scaled indefinitely.
Economies of scale: the benefit grows greater than the cost. Ex: Pin factory of Adam Smith. It was found that pins are much cheaper to produce lots at a time than to produce one at a time. Thus, it is more profitable for one person to create a pin factory that specializes in making and selling pins, than for every person in a village to learn how to make pins and make their own pins independently whenever needed.
The opposite of this is diseconomies of scale, where the costs grow faster than the benefits. Ex: star network. In a typical network, there is one hub or switch where all users are connected. For a small amount of users, this works, but is extremely inefficient when scaling past 5 or 10 users on one switch/hub.
In economies of scale, as the size grows, there may be greater waste encountered. As the pin factory scales, you may end up with lots of pins that you don’t need. In diseconomies of scale, as size grows, there are breakdowns.
Complexity is one of the largest problems encountered in computer system design. Blame on why it’s such a large problem is often placed on hardware designers by software engineers, and vice versa. Due to Moore’s law, the number of transistors on a chip has scaled exponentially, and the result is incredibly complex chips that the operating systems strive to utilize efficiently. A corollary of Moore’s Law is Kryder’s Law, which states that the memory of a magnetic disk drive of fixed size will double approximately every two years. However, even though increased complexity in hardware is readily noticeable, complexity with software is also on the rise. Software engineers sometimes have trouble designing simple systems because there are no natural boundaries to halt complexity. This leads to overly complex solutions to systems that could have otherwise been designed in a simple manner.