CS 6220: Data Mining Techniques
Instructor:
Yizhou Sun
- Office hours: Tuesdays 3:30-5:30pm at 358 WVH
TA:
- Monisha Singh
- Email: msingh28@ccs.neu.edu
- Office hours: Thursdays 12:00-2:00pm at 462 WVH
Lecture times: Mondays 6 - 9 PM
Lecture location: Forsyth Building 236
About the Course
This course introduces concepts, algorithms, and techniques of data mining on
different types of datasets, including (1) matrix data, (2) text data, (3) set data, (4)
sequence data, (5) time series, (6) graph and network, and (7) image data. The class project involves
hands-on practice of mining useful knowledge from large data sets. The course is
a graduate-level computer science course, which is also a good option for senior-level
computer science undergraduate students interested in the field. Also, the
course may attract students from other disciplines who need to understand,
develop, and use data mining systems to analyze large amounts of data.
Prerequisites
- CS 5800 or CS 7800, or consent of instructor
- The students are expected to have knowledge in data structures,
algorithms, basic linear algebra, and basic statistics. You will also need to be familiar with at
least one programming language, and have programming experiences.
Grading
- Homework: 40%
- Midterm exam: 25%
- Course project: 30%
- Participation: 5%
*Note: all the deadlines are 11:59PM (midnight) of the
due dates; No late submissions accepted!
Regrading Policy:
-
If you have doubts in your grading, please submit a
regrading form (via emails to both TAs and CC to the Instructor) indicating clearly the reason why you think it should be
regraded
-
The deadline of the regrading form should be submitted within
one week after you receive your score
-
We will regrade the whole homework/exam
Textbook
Jiawei Han, Micheline Kamber, and Jian Pei.
Data Mining: Concepts and Techniques,
3rd edition, Morgan Kaufmann, 2011
Recommended books for further reading:
- "Data Mining: The Textbook" by Charu Aggarwal (http://www.charuaggarwal.net/Data-Mining.htm)
- "Data Mining" by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (http://www-users.cs.umn.edu/~kumar/dmbook/index.php)
- "Machine Learning" by Tom Mitchell (http://www.cs.cmu.edu/~tom/mlbook.html)
- "Introduction to Machine Learning" by Ethem ALPAYDIN (http://www.cmpe.boun.edu.tr/~ethem/i2ml/)
- "Pattern Classification" by Richard O. Duda, Peter E. Hart, David G.
Stork (http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471056693.html)
- "The Elements of Statistical Learning: Data Mining, Inference, and
Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (http://www-stat.stanford.edu/~tibs/ElemStatLearn/)
- "Pattern Recognition and Machine Learning" by Christopher M. Bishop (http://research.microsoft.com/en-us/um/people/cmbishop/prml/))
Q & A
You are encouraged to come to the office hours of TAs and
the instructor.
Peer-based Q&A via Piazza:
piazza.com/northeastern/fall2015/cs622001/home
Academic Integrity Policy
A commitment to the principles of academic integrity is essential to the
mission of Northeastern University. The promotion of independent and original
scholarship ensures that students derive the most from their educational
experience and their pursuit of knowledge. Academic dishonesty violates the most
fundamental values of an intellectual community and undermines the achievements
of the entire University.
For more information, please refer to the
Academic
Integrity Web page.