Clustering [on stored data]
- R. Ng and J. Han. Efficient and effective clustering method
for spatial data mining. VLDB'94.[CLARANS]
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan.
Automatic subspace clustering of high dimensional data for data
mining applications. SIGMOD'98 [CLIQUE]
- M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics:
Ordering points to identify the clustering structure,
SIGMOD’99.[OPTICS]
- Beil F., Ester M., Xu X.: "Frequent Term-Based Text
Clustering", KDD'02[Text]
- M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF:
Identifying Density-Based Local Outliers. SIGMOD 2000.[Outliers]
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based
algorithm for discovering clusters in large spatial databases.
KDD'96.[DBSCAN]
- D. Gibson, J. Kleinberg, and P. Raghavan. Clustering
categorical data: An approach based on dynamic systems.
VLDB’98.[Categorical]
- S. Guha, R. Rastogi, and K. Shim. Cure: An efficient
clustering algorithm for large databases. SIGMOD'98.[CURE]
- S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering
algorithm for categorical attributes. In ICDE'99, pp. 512-521,
Sydney, Australia, March 1999.[ROCK]
- G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A
Hierarchical Clustering Algorithm Using Dynamic Modeling.
COMPUTER, 32(8): 68-75, 1999.[Hierarchical]
- E. Knorr and R. Ng. Algorithms for mining distance-based
outliers in large datasets. VLDB’98.[Outliers]
- A. Hinneburg, D.l A. Keim: An Efficient Approach to Clustering
in Large Multimedia Databases with Noise. KDD’98 [DENCLUE].
- G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A
multi-resolution clustering approach for very large spatial
databases. VLDB’98.[Wavelets]
- A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng.
Constraint-Based Clustering in Large Databases,
ICDT'01.[Constraints]
- H. Wang, W. Wang, J. Yang, and P.S. Yu. Clustering by pattern
similarity in large data sets, SIGMOD’ 02.[p-cluster]
- W.Wang, J. Yang, R. Muntz, STING: A Statistical Information
grid Approach to Spatial Data Mining, VLDB’97.[STING]
- T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient
data clustering method for very large databases.
SIGMOD'96.[BIRCH]
Data Stream Clustering,
- TECNO-STREAMS: Tracking
Evolving Clusters in Noisy Data Streams with a Scalable
Immune System Learning Model, by Olfa Nasraoui,
Cesar Cardona Uribe, Carlos Rojas Coronel, in the IEEE
International Conf. Data Mining (ICDM) 2003.
- Reverse Nearest Neighbor
Aggregates Over Data Streams, by Flip Korn, S.
Muthukrishnan, Divesh Srivastava, in the International
Conference on Very Large Data Bases (VLDB) 2002.
- A Framework for Clustering
Evolving Data Streams, by Charu C. Aggarwal, Jiawei
Han, Jianyong Wang, Philip S. Yu, in the International
Conference on Very Large Data Bases (VLDB) 2003.
- Streaming-Data Algorithms
for High-Quality Clustering, by Liadan O'Callaghan,
Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motawani, in
the IEEE International Conference Data Engineering (ICDE) 2001.
- Density-based
Clustering
over an Evolving Data Stream with Noise, F Cao, M Ester, W
Qian, and A Zhou, Proceedings of the 2006 SIAM Conference on
Data Mining (SDM'2006).
- Liadan O'Callaghan, Adam Meyerson, Rajeev Motwani, Nina
Mishra, Sudipto Guha: Streaming-Data Algorithms for High-Quality
Clustering. ICDE 2002: 685+
- Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani,
Liadan O'Callaghan: Clustering Data Streams: Theory and
Practice. IEEE Trans. Knowl. Data Eng. 15(3): 515-528 (2003)
- C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A Framework for
Projected Clustering of High Dimensional Data Streams, VLDB'04.
- S. Ben-David and M. Ackerman. Measures of clustering
quality: A working set of axioms for clustering. In NIPS, pages
121–128, 2008.
- Hardy Kremer, Philipp Kranen, Timm Jansen, Thomas Seidl,
Albert Bifet, Geoff Holmes, Bernhard Pfahringer: An effective
evaluation measure for clustering on evolving data streams. KDD
2011. 868-876.
- Jimmy Lin, Rion Snow, William Morgan:Smoothing techniques for
adaptive online language models: topic tracking in tweet
streams. KDD 2011, 422-429.
Books:
[G. J. McLachlan and K.E. Bkasford. Mixture Models:
Inference and Applications to Clustering. John Wiley and Sons,
1988.
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an
Introduction to Cluster Analysis. John Wiley & Sons, 1990.]
Frequent Pattern Mining
- What's Hot and What's Not:
Tracking Most Frequent Items Dynamically, by Graham
Cormode, S. Muthukrishnan, in the ACM Symposium on Principles of
Database Systems (PODS) 2003.
- Dynamically Maintaining Frequent
Items Over A Data Stream, by Cheqing Jin, Weining
Qian, Chaofeng Sha, Jeffrey X. Yu, Aoying Zhou, in the
Conference on Information and Knowledge Management (CIKM) 2003.
- Processing Frequent Itemset
Discovery Queries by Division and Set Containment Join
Operators, by Ralf Rantzau, in the ACM SIGMOD
Workshop on Research Issues in Data Mining and Knowledge
Discovery (DMKD) 2003.
- Approximate Frequency Counts
over Data Streams, by Gurmeet Singh Manku, Rajeev
Motawani, in the International Conference on Very Large Data
Bases (VLDB) 2002.
- An Algorithm for In-Core
Frequent Itemset Mining on Streaming Data, by
Ruoming Jin, Gagan Agrawal, submitted for publication 2004.
- A Simple Algorithm for Finding
Frequent Elements in Streams and Bags, by Richard M.
Karp, Scott Shenker, in the ACM Transactions on Database Systems
(TODS) 2003.
- Bursty and Hierarchical
Structure in Streams, by Jon Kleinberg, in the ACM
International Conference on Knowledge Discovery and Data Mining
(SIGKDD) 2002.
- Online Algorithms for Mining
Semi-structured Data Stream, by Tatsuya Asai, Hiroki
Arimura, Kenji Abe, Shinji Kawasoe, Setsuo Arikawa, in the IEEE
International Conf. Data Mining (ICDM) 2002.
- Finding Hierarchical Heavy
Hitters in Data Streams, by Graham Cormode, Flip
Korn, S. Muthukrishnan, Divesh Srivastava, in the International
Conference on Very Large Data Bases (VLDB) 2003.
- Finding Recent Frequent Itemsets
Adaptively over Online Data Streams, by Joong Hyuk
Chang, Won Suk Lee, in the ACM International Conference on
Knowledge Discovery and Data Mining (SIGKDD) 2003.
- A
survey on algorithms for mining frequent itemsets over data
streams, James Cheng, Yiping Ke and Wilfred Ng, 2007.
- Research
issues in data stream association rule mining ,
Nan
Jiang, Le Gruenwald
SIGMOD Rec., Vol. 35, No. 1. (March 2006), pp. 14-19.
- Verifying
and
Mining Frequent Patterns from Large Windows over Data
Streams. Barzan Mozafari, Hetal Thakkar and Carlo
Zaniolo: ICDE 2008:The 24th International Conference on Data
Engineering, April 7-12, 2008, Cancún, México.
- Hoang Thanh Lam, Toon Calders:
Mining top-k frequent items in a data stream with flexible
sliding windows. KDD 2010, p. 283-292.
Classification, Regression and Other
Learning Methods
- A Streaming Ensemble Algorithm
(SEA) for Large-Scale Classification, by W. Nick
Street, YongSeog Kim, in the ACM International Conference on
Knowledge Discovery and Data Mining (SIGKDD) 2001.
- A Regression-Based Temporal
Pattern Mining Scheme for Data Streams, by Wei-Guang
Teng, Ming-Syan Chen, Philip S. Yu, in the International
Conference on Very Large Data Bases (VLDB) 2003.
- Mining Concept Drifting Data
Streams using Ensemble Classifiers, by Haixun Wang,
Wei Fan, Philip S. Yu, Jiawei Han, in the ACM International
Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.
- Mining High Speed Data Streams,
by Pedro Domingos, Geoff Hulten, in the ACM International
Conference on Knowledge Discovery and Data Mining (SIGKDD) 2000.
- Accurate Decision Trees for Mining
Highspeed Data Streams, by Joao Gama, Ricardo Rocha,
Pedro Medas, in the ACM International Conference on Knowledge
Discovery and Data Mining (SIGKDD) 2003.
- Mining Time-Changing Data
Streams, by Geoff Hulten, Laurie Spencer, Pedro
Domingos, in the ACM International Conference on Knowledge
Discovery and Data Mining (SIGKDD) 2001.
- Efficient Decision Tree
Construction on Streaming Data, by Ruoming Jin,
Gagan Agrawal, in the ACM International Conference on Knowledge
Discovery and Data Mining (SIGKDD) 2003.
- Dynamic Weighted Majority: A
New Ensemble Method for Tracking Concept Drift, by
Jeremy Z. Kolter, Marcus A. Maloof, in the IEEE International
Conf. Data Mining (ICDM) 2003.
- Distributed Web Mining using
Bayesian Networks from Multiple Data Streams, by R.
Chen, K. Sivakumar, H. Kargupta, in the IEEE International Conf.
Data Mining (ICDM) 2001.
- An approach to online Bayesian
learning from multiple data streams, by R. Chen, K.
Sivakumar, H. Kargupta, in the European Conference on Principles
of Data Mining and Knowledge Discovery (PKDD) 2001.
- Adaptive, Hands-Off
Stream Mining, by Spiros Papadimitriou, Anthony
Brockwell, Christos Faloutsos, in the International Conference
on Very Large Data Bases (VLDB) 2003.
- Correlating Synchronous And
Asynchronous Data Streams, by Sudipto Guha, D.
Gunopulos, Nick Koudas, in the ACM International Conference on
Knowledge Discovery and Data Mining (SIGKDD) 2003.
-
Fast and light boosting for adaptive mining of data streams, F.Chu
and
C.Zaniolo,in Proc. of the 5th Pacific-Asic Conference on
Knowledge Discovery and Data Mining (PAKDD), Sydney, May 2004.
- A
Classifier Ensemble-based Engine to Mine Concept Drifting Data
Streams, W Fan, StreamMiner, VLDB'2004.
- Active
Mining
of Data Streams, Wei Fan, Yi-an Huang, Haixun Wang, and
Philip S. Yu, Proceedings of SIAM International Conference on
Data Mining 2004.
- An
adaptive
learning approach for noisy data streams, 4th IEEE
International Conference on Data Mining (ICDM), Fang Chu, Yizhou
Wang, Carlo Zaniolo, 2004.
-
Fast and Light Boosting for Adaptive Mining of Data Streams.
Fang Chu, Carlo ZanioloPAKDD 2004: 282-292.
- An
Adaptive
Nearest Neighbor Classification Algorithm for Data Streams.
Yan-Nei Law, Carlo Zaniolo: PKDD 2005: 108-120.
- Peng Zhang, Jun Li, Peng Wang, Byron J. Gao, Xingquan
Zhu, Li Guo:
Enabling fast prediction for ensemble models on data streams.
KDD 2011: 177-185.
- Josh Attenberg, Foster J. Provost:Online active inference and
learning. KDD 2011, 186-194.
- Wei Chu, Martin Zinkevich, Lihong Li, Achint Thomas, Belle L.
Tseng:
Unbiased online active learning in data streams. KDD 2011,
195-203.
Time Series
- R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity
search in sequence databases. FODO’93 (Foundations of Data
Organization and Algorithms).
- R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast
similarity search in the presence of noise, scaling, and
translation in time-series databases. VLDB'95.
- R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying
shapes of histories. VLDB'95.
- C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast
subsequence matching in time-series databases. SIGMOD'94.
- Carlo Zaniolo et al. Chapt 12 in Advanced Database Systems,
Morgan-Kaufmann, 1997
- Nasser Yazdani, Z. Meral Özsoyoglu: Sequence Matching of
Images. SSDBM 1996: 53-62.
- Y. Moon, K. Whang, W. Loh. Duality Based Subsequence Matching
in Time-Series Databases, ICDE’02
- B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient
retrieval of similar time sequences under time warping. ICDE'98.
- B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C.
Faloutsos, and A. Biliris. Online data mining for co-evolving
time sequences. ICDE'00.
- Dennis Shasha and Yunyue Zhu. High Performance Discovery in
Time Series: Techniques and Case Studies, SPRINGER, 2004.
- Eamonn J. Keogh: Indexing and Mining Time Series Data.
Encyclopedia of GIS 2008: 493-497.
Jin Shieh, Eamonn J. Keogh: iSAX: indexing and mining terabyte
sized time series. KDD 2008: 623-631.
- Jessica Lin, Michail Vlachos, Eamonn J. Keogh, Dimitrios
Gunopulos: Iterative Incremental Clustering of Time Series. EDBT
2004: 106-122.
- Louis Lovas,What Is: Time Series, 05, 2012. http://www.bigdataforfinance.com/bigdata/2012/05/what-is-time-series.html
[Reference
Books on Time Series:
C. Chatfield. The Analysis of Time Series: An Introduction, 3rd
ed. Chapman & Hall, 1984.
R.H. Shumway & D.S. Stoffer. Time Series Analysis and Its
Applications: With R Examples (2nd ed.), Springer Texts in
Statistics, 2006. http://www.stat.pitt.edu/stoffer/tsa2/index.html
StatSoft. Electronic Textbook.
www.statsoft.com/textbook/stathome.html]
|