CS 6220: Data Mining Techniques

News

[9/28/2015] Office hours have been changed to Tuesday afternoons 3:30-5:30pm

[9/14/2015] First day of classes


Class Schedule

(Future lectures and events are tentative.)

Week# Date Topic Slides Assignment Project Reading (Textbook or Other Materials)
2 Sep. 14 Introduction and Know Your Data 01Introduction
02Data
    Chapter 1, 2, 3 
Math overview:
3 Sep.21 Course Project Introduction
Matrix Data: Prediction (linear regression); Classification (decision tree, evaluation)
Course Project Overview
03Matrix_Prediction
04Matrix_Classification_1
#1 out   Notes by Andrew Ng (Sec. 1-3 in Part 1): http://cs229.stanford.edu/notes/cs229-notes1.pdf

Chapter 8.1, 8.2, 8.5
4 Sep. 28 Matrix Data: Classification (Naive Bayes, logistic regression) Prob_review
04Matrix_Classification_2
  Team formation due (Sep. 27)
Chapter 8.3, 9.1
Notes by Tom Mitchell: http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf
Notes on derivation of P(C_j) in Naive Bayes
review of probability: http://cs229.stanford.edu/section/cs229-prob.pdf
5 Oct. 5 Matrix Data: Classification (SVM, kNN, and other issues) 04Matrix_Classification_3 #1 due (Oct. 4)/ #2 out   Chapter 9.3, 9.5, 8.6, 9.7
Notes on SVM by Andrew Ng: http://cs229.stanford.edu/notes/cs229-notes3.pdf
6 Oct.12 Columbus Day (No Class)     Proposal due (Oct. 12)  
7 Oct. 19 Matrix Data: Clustering (k-means, hierarchical clustering, DBSCAN) 04Matrix_Clustering_1 #2 due (Oct. 18) / #3 out   Chapter 10.1, 10.2, 10.3, 10.4, 10.6
8 Oct. 26 Matrix Data: Clustering (GMM)
Text Data: Topic Models (PLSA )
04Matrix_Clustering_2
05Text

  Chapter 11.1, 11.3
Notes on mixture models and EM algorithm: http://www.stat.cmu.edu/~cshalizi/350/lectures/29/lecture-29.pdf and http://www.cs.ubc.ca/~murphyk/Teaching/CS340-Fall06/reading/mixtureModels.pdf
pLSA tutorial: http://arxiv.org/pdf/1212.3900.pdf
topic modeling tutorial: https://www.cs.princeton.edu/~blei/kdd-tutorial.pdf
9 Nov.2 Set Data: Frequent Pattern Mining (Apriori, FP-growth) 06Set #3 due (Nov. 1) / #4 out   Chapter 6
10 Nov. 9 Midterm Exam        
11 Nov. 16 Graph / Network: PageRank, Personalized PageRank
Image Data: Neural Networks
08Graph
09Image_NN
#4 due (Nov. 15)/ #5 out   Chapter 9.2
ANN by Tom Mitchell: http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/mlbook/ch4.pdf
12 Nov. 23 Sequence Data: Sequential pattern mining (GSP), HMM 07Sequence   Midterm Report due (Nov. 22) Reference: Chapter 8.3 in Han's Data Mining Book, Edition 2
Papers: GSP, PrefixSpan
13 Nov. 30 Time Series: forcasting, similarity search (DTW) 10TimeSeries #5 due (Nov. 29)   References: DTW
14 Dec. 7 No Class        
15 Dec. 14 Course Project Final Presentation     Final Report & Code (Dec. 14)