Course description

What genes cause cancer ? Have we inherited genes from Neanderthals ? How does a single genome code for the diverse function that we see?

We can now begin to answer these fascinating questions in biology because the cost of genome sequencing has fallen faster than Moore's law. The bottleneck in answering these questions has shifted from data generation to powerful statistical models and inference algorithms that can make sense of this data. Statistical machine learning provides an important toolkit in this endeavor. Further, biological datasets offer new challenges to the field of machine learning.

We will learn about probabilistic models, inference and learning in these models, model assessment, and interpreting the inferences to address the biological questions of interest. The course aims to introduce CS/Statistics students to an important set of problems and Bioinformatics/Human Genetics students to a rich set of tools.


Familiarity with probability, statistics, linear algebra and algorithms is expected. No familiarity with biology is needed.

Contact Info

Instructor: Sriram Sankararaman
Office Hours: Boelter 4531D, Wednesday 1:00 - 2:00p (or by appointment)
Email: sriram at cs dot ucla dot edu


There is no formal textbook. Readings will be posted as needed. The following texts will serve as useful references:

Course format

Grading A tentative syllabus


The course website is based on material developed by Ameet Talwalkar and Fei Sha. Some of the administrative content on the course website is adapted from material from Jenn Wortman Vaughan, Rich Korf, and Alexander Sherstov.

Tentative Schedule

Date Topics Reading HW
09/26 Introduction to genomics Big Data: Astronomical or Genomical?
Nova: Personal DNA testing

Introductory statistics

Multiple testing

10/03 Association studies: linear regression Homework 1
10/05 Association studies: logistic regression
10/10 Heritability: ridge regression and mixed models
10/12 Clustering and mixture models
10/17 The EM algorithm Homework 2
Data for Homework 2
10/19 No class
10/26 PCA
10/28 PCA and probabilistic PCA
10/31 Admixture models Homework 3
Data for Homework 3
11/02 Applications of admixture models and population stratification
11/07 Directed graphical models (DGMs)
11/09 DGMs: Conditional independence
11/14 DGMs. Hidden Markov Models
11/16 Mid-term
11/21 Hidden Markov Models Homework 4+5
Data for Homework 4+5
11/23 Kernels
11/28 Neural networks and deep learning
11/30 Genomic privacy