CS 6220: Data Mining Techniques
Instructor:
Yizhou Sun
 Office hours: Tuesdays 2:004:00pm at 358 WVH
TA:

Kamlendra Kumar
 Email: kumark@ccs.neu.edu
 Office hours: Fridays 4:006:00pm at 462 WVH
Lecture times: Wednesdays 6  9 PM
Lecture location: Knowles Center 002B
About the Course
This course introduces concepts, algorithms, and techniques of data mining on
different types of datasets, including (1) matrix data, (2) text data, (3) set data, (4)
sequence data, (5) time series, (6) graph and network, and (7) image data. The class project involves
handson practice of mining useful knowledge from large data sets. The course is
a graduatelevel computer science course, which is also a good option for seniorlevel
computer science undergraduate students interested in the field. Also, the
course may attract students from other disciplines who need to understand,
develop, and use data mining systems to analyze large amounts of data.
Prerequisites
 CS 5800 or CS 7800, or consent of instructor
 The students are expected to have knowledge in data structures,
algorithms, basic linear algebra, and basic statistics. You will also need to be familiar with at
least one programming language, and have programming experiences.
Grading
 Homework: 40%
 Midterm exam: 25%
 Course project: 30%
 Participation: 5%
*Note: all the deadlines are 11:59PM (midnight) of the
due dates; No late submissions accepted!
Regrading Policy:

If you have doubts in your grading, please submit a
regrading form (via emails to both TAs and CC to the Instructor) indicating clearly the reason why you think it should be
regraded

The deadline of the regrading form should be submitted within
one week after you receive your score

We will regrade the whole homework/exam
Textbook
Jiawei Han, Micheline Kamber, and Jian Pei.
Data Mining: Concepts and Techniques,
3rd edition, Morgan Kaufmann, 2011
Recommended books for further reading:
 "Data Mining: The Textbook" by Charu Aggarwal (http://www.charuaggarwal.net/DataMining.htm)
 "Data Mining" by PangNing Tan, Michael Steinbach, and Vipin Kumar (http://wwwusers.cs.umn.edu/~kumar/dmbook/index.php)
 "Machine Learning" by Tom Mitchell (http://www.cs.cmu.edu/~tom/mlbook.html)
 "Introduction to Machine Learning" by Ethem ALPAYDIN (http://www.cmpe.boun.edu.tr/~ethem/i2ml/)
 "Pattern Classification" by Richard O. Duda, Peter E. Hart, David G.
Stork (http://www.wiley.com/WileyCDA/WileyTitle/productCd0471056693.html)
 "The Elements of Statistical Learning: Data Mining, Inference, and
Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (http://wwwstat.stanford.edu/~tibs/ElemStatLearn/)
 "Pattern Recognition and Machine Learning" by Christopher M. Bishop (http://research.microsoft.com/enus/um/people/cmbishop/prml/))
Q & A
You are encouraged to come to the office hours of TAs and
the instructor.
Peerbased Q&A via Piazza:
piazza.com/northeastern/spring2016/cs6220
Academic Integrity Policy
A commitment to the principles of academic integrity is essential to the
mission of Northeastern University. The promotion of independent and original
scholarship ensures that students derive the most from their educational
experience and their pursuit of knowledge. Academic dishonesty violates the most
fundamental values of an intellectual community and undermines the achievements
of the entire University.
For more information, please refer to the
Academic
Integrity Web page.