Stream Mining Bibliography

Bibliography: Mining Data Bases and Data Streams
[based on: wis.cs.ucla.edu/~hxwang/stream/bib.html]

An overview paper

Georges Hebrail, 2008: Data stream Management and Mining. http://biblio.telecom-paristech.fr/cgi-bin/download.cgi?id=9742

Data Stream Mining Applications

The four papers in this directory are proposing demos for the SIGKDD 2009 conference.
The papers describe Data Stream Mining applications and they often have links to a websites
with more information and webdemos.

Clustering [on stored data]

R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. VLDB'94.[CLARANS]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD'98 [CLIQUE]
M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure, SIGMOD’99.[OPTICS]
Beil F., Ester M., Xu X.: "Frequent Term-Based Text Clustering", KDD'02[Text]
M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. SIGMOD 2000.[Outliers]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. KDD'96.[DBSCAN]
D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamic systems. VLDB’98.[Categorical]
S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. SIGMOD'98.[CURE]
S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm for categorical attributes. In ICDE'99, pp. 512-521, Sydney, Australia, March 1999.[ROCK]
G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER, 32(8): 68-75, 1999.[Hierarchical]
E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. VLDB’98.[Outliers]
A. Hinneburg, D.l A. Keim: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. KDD’98 [DENCLUE].
G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution clustering approach for very large spatial databases. VLDB’98.[Wavelets]
A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based Clustering in Large Databases, ICDT'01.[Constraints]
H. Wang, W. Wang, J. Yang, and P.S. Yu. Clustering by pattern similarity in large data sets, SIGMOD’ 02.[p-cluster]
W.Wang, J. Yang, R. Muntz, STING: A Statistical Information grid Approach to Spatial Data Mining, VLDB’97.[STING]
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient data clustering method for very large databases. SIGMOD'96.[BIRCH]

Data Stream Clustering,

TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model, by Olfa Nasraoui, Cesar Cardona Uribe, Carlos Rojas Coronel, in the IEEE International Conf. Data Mining (ICDM) 2003.
Reverse Nearest Neighbor Aggregates Over Data Streams, by Flip Korn, S. Muthukrishnan, Divesh Srivastava, in the International Conference on Very Large Data Bases (VLDB) 2002.
A Framework for Clustering Evolving Data Streams, by Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu, in the International Conference on Very Large Data Bases (VLDB) 2003.
Streaming-Data Algorithms for High-Quality Clustering, by Liadan O'Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motawani, in the IEEE International Conference Data Engineering (ICDE) 2001.
Density-based Clustering over an Evolving Data Stream with Noise, F Cao, M Ester, W Qian, and A Zhou, Proceedings of the 2006 SIAM Conference on Data Mining (SDM'2006).
Liadan O'Callaghan, Adam Meyerson, Rajeev Motwani, Nina Mishra, Sudipto Guha: Streaming-Data Algorithms for High-Quality Clustering. ICDE 2002: 685+
Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, Liadan O'Callaghan: Clustering Data Streams: Theory and Practice. IEEE Trans. Knowl. Data Eng. 15(3): 515-528 (2003)
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A Framework for Projected Clustering of High Dimensional Data Streams, VLDB'04.
S. Ben-David and M. Ackerman. Measures of clustering quality: A working set of axioms for clustering. In NIPS, pages 121–128, 2008.
Hardy Kremer, Philipp Kranen, Timm Jansen, Thomas Seidl, Albert Bifet, Geoff Holmes, Bernhard Pfahringer: An effective evaluation measure for clustering on evolving data streams. KDD 2011. 868-876.
Jimmy Lin, Rion Snow, William Morgan:Smoothing techniques for adaptive online language models: topic tracking in tweet streams. KDD 2011, 422-429.

Books:
[G. J. McLachlan and K.E. Bkasford. Mixture Models: Inference and Applications to Clustering. John Wiley and Sons, 1988.
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.]

Frequent Pattern Mining

What's Hot and What's Not: Tracking Most Frequent Items Dynamically, by Graham Cormode, S. Muthukrishnan, in the ACM Symposium on Principles of Database Systems (PODS) 2003.
Dynamically Maintaining Frequent Items Over A Data Stream, by Cheqing Jin, Weining Qian, Chaofeng Sha, Jeffrey X. Yu, Aoying Zhou, in the Conference on Information and Knowledge Management (CIKM) 2003.
Processing Frequent Itemset Discovery Queries by Division and Set Containment Join Operators, by Ralf Rantzau, in the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD) 2003.
Approximate Frequency Counts over Data Streams, by Gurmeet Singh Manku, Rajeev Motawani, in the International Conference on Very Large Data Bases (VLDB) 2002.
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data, by Ruoming Jin, Gagan Agrawal, submitted for publication 2004.
A Simple Algorithm for Finding Frequent Elements in Streams and Bags, by Richard M. Karp, Scott Shenker, in the ACM Transactions on Database Systems (TODS) 2003.
Bursty and Hierarchical Structure in Streams, by Jon Kleinberg, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2002.
Online Algorithms for Mining Semi-structured Data Stream, by Tatsuya Asai, Hiroki Arimura, Kenji Abe, Shinji Kawasoe, Setsuo Arikawa, in the IEEE International Conf. Data Mining (ICDM) 2002.
Finding Hierarchical Heavy Hitters in Data Streams, by Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava, in the International Conference on Very Large Data Bases (VLDB) 2003.
Finding Recent Frequent Itemsets Adaptively over Online Data Streams, by Joong Hyuk Chang, Won Suk Lee, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
A survey on algorithms for mining frequent itemsets over data streams, James Cheng, Yiping Ke and Wilfred Ng, 2007.
Research issues in data stream association rule mining , Nan Jiang, Le Gruenwald
SIGMOD Rec., Vol. 35, No. 1. (March 2006), pp. 14-19.
Verifying and Mining Frequent Patterns from Large Windows over Data Streams. Barzan Mozafari, Hetal Thakkar and Carlo Zaniolo: ICDE 2008:The 24th International Conference on Data Engineering, April 7-12, 2008, Cancún, México.
Hoang Thanh Lam, Toon Calders:
Mining top-k frequent items in a data stream with flexible sliding windows. KDD 2010, p. 283-292.

Classification, Regression and Other Learning Methods

A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification, by W. Nick Street, YongSeog Kim, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2001.
A Regression-Based Temporal Pattern Mining Scheme for Data Streams, by Wei-Guang Teng, Ming-Syan Chen, Philip S. Yu, in the International Conference on Very Large Data Bases (VLDB) 2003.
Mining Concept Drifting Data Streams using Ensemble Classifiers, by Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
Mining High Speed Data Streams, by Pedro Domingos, Geoff Hulten, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2000.
Accurate Decision Trees for Mining Highspeed Data Streams, by Joao Gama, Ricardo Rocha, Pedro Medas, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
Mining Time-Changing Data Streams, by Geoff Hulten, Laurie Spencer, Pedro Domingos, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2001.
Efficient Decision Tree Construction on Streaming Data, by Ruoming Jin, Gagan Agrawal, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift, by Jeremy Z. Kolter, Marcus A. Maloof, in the IEEE International Conf. Data Mining (ICDM) 2003.
Distributed Web Mining using Bayesian Networks from Multiple Data Streams, by R. Chen, K. Sivakumar, H. Kargupta, in the IEEE International Conf. Data Mining (ICDM) 2001.
An approach to online Bayesian learning from multiple data streams, by R. Chen, K. Sivakumar, H. Kargupta, in the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) 2001.
Adaptive, Hands-Off Stream Mining, by Spiros Papadimitriou, Anthony Brockwell, Christos Faloutsos, in the International Conference on Very Large Data Bases (VLDB) 2003.
Correlating Synchronous And Asynchronous Data Streams, by Sudipto Guha, D. Gunopulos, Nick Koudas, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
Fast and light boosting for adaptive mining of data streams, F.Chu and C.Zaniolo,in Proc. of the 5th Pacific-Asic Conference on Knowledge Discovery and Data Mining (PAKDD), Sydney, May 2004.
A Classifier Ensemble-based Engine to Mine Concept Drifting Data Streams, W Fan, StreamMiner, VLDB'2004.
Active Mining of Data Streams, Wei Fan, Yi-an Huang, Haixun Wang, and Philip S. Yu, Proceedings of SIAM International Conference on Data Mining 2004.
An adaptive learning approach for noisy data streams, 4th IEEE International Conference on Data Mining (ICDM), Fang Chu, Yizhou Wang, Carlo Zaniolo, 2004.
Fast and Light Boosting for Adaptive Mining of Data Streams. Fang Chu, Carlo ZanioloPAKDD 2004: 282-292.
An Adaptive Nearest Neighbor Classification Algorithm for Data Streams. Yan-Nei Law, Carlo Zaniolo: PKDD 2005: 108-120.
Peng Zhang, Jun Li, Peng Wang, Byron J. Gao, Xingquan Zhu, Li Guo:
Enabling fast prediction for ensemble models on data streams. KDD 2011: 177-185.
Josh Attenberg, Foster J. Provost:Online active inference and learning. KDD 2011, 186-194.
Wei Chu, Martin Zinkevich, Lihong Li, Achint Thomas, Belle L. Tseng:
Unbiased online active learning in data streams. KDD 2011, 195-203.

Time Series

R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. FODO’93 (Foundations of Data Organization and Algorithms).
R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. VLDB'95.
R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying shapes of histories. VLDB'95.
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. SIGMOD'94.
Carlo Zaniolo et al. Chapt 12 in Advanced Database Systems, Morgan-Kaufmann, 1997
Nasser Yazdani, Z. Meral Özsoyoglu: Sequence Matching of Images. SSDBM 1996: 53-62.
Y. Moon, K. Whang, W. Loh. Duality Based Subsequence Matching in Time-Series Databases, ICDE’02
B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. ICDE'98.
B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. ICDE'00.
Dennis Shasha and Yunyue Zhu. High Performance Discovery in Time Series: Techniques and Case Studies, SPRINGER, 2004.
Eamonn J. Keogh: Indexing and Mining Time Series Data. Encyclopedia of GIS 2008: 493-497.
Jin Shieh, Eamonn J. Keogh: iSAX: indexing and mining terabyte sized time series. KDD 2008: 623-631.
Jessica Lin, Michail Vlachos, Eamonn J. Keogh, Dimitrios Gunopulos: Iterative Incremental Clustering of Time Series. EDBT 2004: 106-122.
Louis Lovas,What Is: Time Series, 05, 2012. http://www.bigdataforfinance.com/bigdata/2012/05/what-is-time-series.html

[Reference Books on Time Series:
C. Chatfield. The Analysis of Time Series: An Introduction, 3rd ed. Chapman & Hall, 1984.

R.H. Shumway & D.S. Stoffer. Time Series Analysis and Its Applications: With R Examples (2nd ed.), Springer Texts in Statistics, 2006. http://www.stat.pitt.edu/stoffer/tsa2/index.html

StatSoft. Electronic Textbook. www.statsoft.com/textbook/stathome.html]

Change, Novelty Detection

Online Novelty Detection on Temporal Sequences, by Junshui Ma, Simon Perkins, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
A Framework for Diagnosing Changes in Evolving Data Streams, by Charu C. Aggarwal, in the ACM International Conference on Management of Data (SIGMOD) 2003.
Efficient Elastic Burst Detection in Data Streams, by Yunyue Zhu, Dennis Shasha, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
Active Mining of Data Streams, by Wei Fan, Yi-an Huang, Haixun Wang, Philip S Yu, in the SIAM International Conference on Data Mining (SIAM DM) 2004.

Bibliography: Mining Data Bases and Data Streams [based on: wis.cs.ucla.edu/~hxwang/stream/bib.html]

An overview paper

Data Stream Mining Applications

Clustering [on stored data]

Data Stream Clustering,

Frequent Pattern Mining

Classification, Regression and Other Learning Methods

Time Series

Change, Novelty Detection

Bibliography: Mining Data Bases and Data Streams
[based on: wis.cs.ucla.edu/~hxwang/stream/bib.html]