CS188: Introduction to Machine Learning (Winter 2017)

Lecture: Tuesday / Thursday 4:00pm - 6:00pm, Kinsey Pavilion 1240B

Discussions: Friday 12-1:50pm, 2-3:50pm

Course description

Machine Learning encompasses the study of algorithms that learn from data. It has been a key component in a number of problem domains including computer vision, natural language processing, computational biology and robotics. This class will introduce the fundamental concepts and algorithms in machine learning (supervised as well as unsupervised learning) as well as best practices in applying machine learning to practical problems. The class consists of lectures, problem sets that contain mathematical and programming exercises and two in-class exams.

Prerequisites

Undergraduate level training or coursework in algorithms, linear algebra, calculus and multivariate calculus, basic probability and statistics; an undergraduate level course in Artificial Intelligence may be helpful but is not required. A background in programming will also be necessary for the problem sets; specifically students are expected to be familiar with python and scikit-learn (a machine learning package for python) or learn it during the course.

Contact Info

Instructor: Sriram Sankararaman
Office Hours: Boelter 4531D, Wednesday 1:00-2:00pm
Email: sriram at cs dot ucla dot edu

Teaching assistants

Karthik Vemulapalli
Office hours: Tuesday 11:30 am - 1:30pm
Email: karthikv at cs dot ucla dot edu
Sophia Yan
Office hours: Tuesday 9:00 am - 11:00am
Email: xyan18 at ucla dot edu

Textbooks

While there is not one textbook that covers all the material from this course, readings will come from the following texts:

A course in machine learning: by Hal Daume III, which will be referred to as CIML (freely available online) is the primary reference.
Machine Learning: The art and science of algorithms that make sense of data by Peter Flach, referred to as FL
Patter recognition and machine learning by Christopher M. Bishop , referred to as PRML
An introduction to statistical learning: with application in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, refered to as ISL

For a more advanced treatment, the following are useful:

Machine Learning: A Probabilistic Perspective by Kevin Murphy.
Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman (freely available online)

Machine Learning requires a strong mathematical foundation. You may find the following resources useful to brush up your math background.

Probability
- Review notes from Stanford's machine learning course
Linear algebra
- Review notes from Stanford's machine learning course
Optimization
- Review notes from Stanford's machine learning course
- Review notes from Stanford's machine learning course

Course format

Homework (50%): There will be periodic homeworks. Questions on the homework will include math exercises, programming exercises and data analyses.
- We will use gradescope to manage submission of homeworks.
- Homeworks are due at the beginning of class (4pm) on the due date.
- Late submissions will not be accepted
- All solutions must be clearly written (or typed) ; unreadable answers will not be graded. We encourage using LaTeX to type out answers.
- Solutions will be graded on both correctness and clarity. If you cannot solve a problem completely, you will get more partial credit by identifying the gaps in your argument than by attempting to cover them up.
- We will drop the homework with the lowest grade.
You are free to discuss the homework problems. However, you must write up your own solutions. You must also acknowledge all collaborators.
Mini quiz on math background (0%): This is a in class, closed-book and closed-notes mini quiz that will help you evaluate your background. This quiz does not count towards your final grade.
Exams (Mid-term: 20%, Final: 30%): There are two exams schedule for Feb 14 and March 22. Exams are in class, closed-book and closed-notes and will cover material from the lectures and the problem sets. No alternate or make-up exams will be administered, except for disability/medical reasons documented and communicated to the instructor prior to the exam date. In particular, exam dates and times cannot be changed to accommodate scheduling conflicts with other classes.

Software

We will extensively be using Python 2.7.x to implement ML algorithms and run experiments. You will require and need to familiarize yourself with the following packages:

numpy: contains tools for numerical linear algebra, random number generation. For a numpy tutorial, see here .
scipy
scikit-learn : contains tools for machine learning and data science. For a tutorial, see here

Forums

Piazza

We will use Piazza for class discussions. Please go to this Piazza website to join the course forum (note: you must use a ucla.edu email account to join the forum). We strongly encourage students to post on this forum rather than emailing the course staff directly (this will be more efficient for both students and staff). Students should use Piazza to:

Ask clarifying questions about the course material.
Share useful resources with classmates (so long as they do not contain homework solutions).
Look for project partners or other students to form study groups.
Answer questions posted by other students to solidify your own understanding of the material.

The course Academic Integrity Policy must be followed on the message boards at all times. Do not post or request homework solutions! Also, please be polite.

Gradescope

We will use gradescope to manage and grade homeworks and exams.

Please see this guide

Policies

Academic Integrity Policy

Group studying and collaborating on problem sets are encouraged, as working together is a great way to understand new material. Students are free to discuss the homework problems with anyone under the following conditions:

Students must write their own solutions and understand the solutions that they wrote down.
Students must list the names of their collaborators (i.e., anyone with whom the assignment was discussed).
Students may not use old solution sets from this class or any other class under any circumstances, unless the instructor grants special permission.

Students are encouraged to read the Dean of Students' guide to Academic Integrity.

Attendance and class participation

Although not a formal component of the course grade, attendance is essential for success in this course. If you are absent without a documented excuse, the instructor and TAs will not be able to go over missed lecture material with you. We emphatically welcome questions and your active participation in this course will enhance your learning experience and that of the other students.

Regrade requests

Regrade requests for homework and exams must be made within one week after the graded homeworks have been handed out, regardless of your attendance on that day and regardless of any intervening holidays such as Memorial Day. We reserve the right to regrade all problems for a given regrade request.

Acknowledgments

The course website is based on material developed by Ameet Talwalkar and Fei Sha. Some of the administrative content on the course website is adapted from material from Jenn Wortman Vaughan, Rich Korf, and Alexander Sherstov.

Tentative Schedule (subject to change)

Date	Topics	Readings	Problem Sets
01/10	Introduction		Problem Set 0
01/12	Probability	PRML 1.2-1.2.2
01/17	Math review mini quiz. Statistics
01/19	Decision trees.	CIML 1.3,1.5-1.10
01/24	Nearest neighbors	CIML 2-2.3
01/26	Linear classification (perceptron)	CIML 3
01/31	Logistic regression	CIML 6.3
02/02	Linear regression	CIML 6-6.2, 6.4-6.6
02/07	Overfitting and regularization
02/09	Kernels	CIML 9-9.2, 9.4-9.6
02/14	In-class mid-term
02/16	Support Vector Machines	CIML 6.7
02/21	Ensemble methods	CIML 11-11.3
02/23	Dimensionality reduction	CIML 13.2
02/28	Clustering	CIML 2.4, 13-13.1
03/02	Mixture models	CIML 14-14.1
03/07	The Expectation Maximization algorithm	CIML 14.2
03/09	Hidden Markov Models (HMMs)
03/14	HMMs continued
03/16	Neural networks