240B: Advanced Data and Knowledge Bases:
Sample of topics for presentations and projects
Event Processing Languages and Systems
[Query languages for complex events. [1--3: early proposals. 4 language+optimization. 5+6 proposed standards+blog]
- Seshadri, P., Linvy,
M., And Ramakrishnan, R. 1994. In Proceedings of ACM SIGMOD Conference
on Management of Data. ACM, New York, 430–441.
- Seshadri, P., Livny,
M., And Ramakrishnan, R. 1995. SEQ: A model for sequence databases.
In
ICDE. 232–239.
- Ramakrishnan, R., Donjerkovic,
D., Ranganathan, A., Beyer, K., and Krishnaprasad,
SRQL: Sorted relational query language. In Proceedings of the 10th Annual
International Conference
on Scientific and Statistical Database Management (Capri, Italy, July 1–3),
1988, 84–95.
- Reza Sadri, Carlo Zaniolo, Amir
Zarkesh, Jafar
Adibi: Expressing and optimizing sequence queries in
database systems. ACM Transactions on Database Systems (TODS) Volume 29 , Issue 2 (June 2004).
- Fred Zemke, Andrew Witkowski,
Mitch Cherniak, Latha
Colby, Pattern matching in sequences of row, ANSI change proposal, March 27,
http://www.cs.ucla.edu/classes/spring07/cs240B/notes/row-pattern-recogniton-11.pdf.
- Discussion Blog for above: http://tkyte.blogspot.com/2007/04/so-in-your-opinion.html
[Event Processing using WebSpere]
IBM Redbooks | WebSphere Business Integration Adapter
Development , http://www.redbooks.ibm.com/abstracts/redp9119.html?Open
Ana Biazetti and Kim Gadja:
Achieving complex event processing with Active Correlation Technology--Rule
your domains with rules to trigger automated processes.http://www.ibm.com/developerworks/autonomic/library/ac-acact/index.html
[Event Processing using Java Message Service]
Sun's official JMS site includes documentation, FAQs
and a JMS vendor list. java.sun.com/products/jms/
[Pub/Sub]
Patrick Th. Eugster et al.: The many faces of publish/subscribe.
CM Computing Surveys (CSUR) archive
Volume 35 , Issue 2 (June 2003), 114 - 131.
DSMS and Technology
[Overviews]
B. Babcock, S. Babu, M. Datar,
R. Motwani, J. Widom: Models and Issues in Data Stream
Systems. PODS 2002: 1-16
Lukasz
Golab and M. Tamer ¨Ozsu.
Issues in data stream management. ACM SIGMOD Record, 32(2):5–14, 2003.
[Applications]
1.
Cranor, Johnson, Spatscheck & Shkapenyuk. Gigascope: A Stream Database
for Network Applications. SIGMOD 2003
Joseph M. Hellerstein. From Database to Dataflow:
New Directions in IT.
Medical Records Institute Health IT Advisory Report 3(6) (2002).
3.
Lerner & Shasha. The Virtues and
Challenges of Ad Hoc + Streams Querying
in Finance. IEEE Data Engineering Bulletin, March 2003.
4.
Sistal, Wolfson, Chamberlain, Dao.
Modeling and Querying Moving Objects.
ICDE 1997.
5.
Yao & Gehrke. Query Processing for Sensor Networks. CIDR 2003.
[Language and Systems]
- Sankar Subramanian, Srikanth
Bellamkonda, Hua-Gang
Li, Vince Liang, Lei Sheng,
Wayne Smith, James Terry, Tsae-Feng Yu, Andrew Witkowski:
Continuous Queries in Oracle. VLDB 2007: 1173-1184Arvind Arasu, Shivnath Babu, Jennifer Widom:
- The CQL continuous query language: semantic foundations and query execution.
VLDB J. 15(2): 121-142 (2006)
- Hari Balakrishnan,
Magdalena Balazinska, Donald Carney, Ugur
Çetintemel, Mitch Cherniack,
Christian Convey, Eduardo F. Galvez, Jon Salz, Michael Stonebraker, Nesime Tatbul, Richard Tibbetts, Stanley
B. Zdonik: Retrospective on Aurora. VLDB J. 13(4):
370-383 (2004).
- Jeong-Hyon Hwang, Ugur
Çetintemel, Stanley
B. Zdonik: Fast and Reliable Stream Processing over
Wide Area Networks. ICDE Workshops 2007: 604-613.
- Jeong-Hyon Hwang, Magdalena
Balazinska, Alex Rasin,
Ugur Çetintemel, Michael Stonebraker,
Stanley B. Zdonik:
High-Availability Algorithms for Distributed Stream Processing. ICDE 2005:
779-790.
- Charles D. Cranor, Theodore Johnson, Oliver Spatscheck,
Vladislav Shkapenyuk:
Gigascope: A Stream Database for Network Applications.
SIGMOD Conference 2003: 647-651
- Arvind Arasu, Mitch
Cherniack, Eduardo F. Galvez,
David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker,
Richard Tibbetts: Linear Road: A Stream Data Management
Benchmark. VLDB 2004.
- Yan-Nei Law, Haixun
Wang, Carlo Zaniolo: Query Languages and Data Models for Database Sequences
and Data Streams. VLDB 2004. 492-503.
- J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ:
A scalable continuous query system for internet databases. In Proc. of the
2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 379-390, May 2000.
- Sailesh Krishnamurthy, Sirish
Chandrasekaran, Owen Cooper, Amol
- Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei
Hong, Samuel R.
- Madden, Vijayshankar Raman, Fred Reiss, and Mehul
A. Shah.
- Sailesh Krishnamurthy et al.: TelegraphCQ:
An Architectural Status Report. IEEE Data Engineering Bulletin, Vol 26(1), March 2003.
- Sam Madden, Mehul A. Shah, Joseph M. Hellerstein,
Vijayshankar Raman: Continuously Adaptive Continuous
Queries over Streams. SIGMOD 2002, 49-61.
- D. Barbara. The characterization of continuous queries. Intl.
Journal of Cooperative Information Systems, 8(4):295-323, 1999.
- S. Chandrasekaran and M. Franklin. Streaming
queries over streaming data. In VLDB, 2002.
- J. Chen, D. J. DeWitt, F. Tian,
and Y. Wang. NiagaraCQ: A scalable continuous query system for internet
databases. In SIGMOD, pages 379-390, May 2000.
- H. Jagadish, I. Mumick, and A. Silberschatz.
View maintenance issues for the chronicle data model. In PODS,
pages 113-124, 1995.
- L. Liu, C. Pu, and W. Tang. Continual queries for internet
scale event-driven information delivery. IEEE TKDE, 11(4):583-590,
1999.
- M. Sullivan. Tribeca: A stream database manager for network traffic
analysis. In VLDB, 1996.
- D. Terry, D. Goldberg, D. Nichols, and B. Oki.
Continuous queries over append-only databases. In SIGMOD, pages 321-330,
1992.
[Windows, Operators and Timestamps]
- Arvind Arasu, Jennifer
Widom: Resource Sharing in Continuous Sliding-Window
Aggregates. VLDB 2004.
- Utkarsh Srivastava,
Jennifer Widom: Memory-Limited Execution of Windowed
Stream Joins. VLDB 2004: 324-335
- Yijian Bai, Hetal
Thakkar, Chang Luo, Haixun Wang, Carlo Zaniolo:
A Data Stream Language and System Designed for Power and Extensibility. Proc.
of the ACM 15th Conference on Information and Knowledge Management (CIKM'06),
2006.
- Yijian Bai et al.,
Optimizing Timestamp Management in Data Stream Management Systems, ICDE 2007.
- Yijian Bai, Hetal
Thakkar, Haixun Wang, Carlo
Zaniolo:A Flexible Query Graph
Based Model for the Efficient Execution of Continuous Queries. The First International
Workshop on Scalable Stream Processing Systems (SSPS'07), April
16-20, 2007, Istanbul, Turkey
- Theodore Johnson, S. Muthukrishnan, Vladislav Shkapenyuk, Oliver Spatscheck:
A Heartbeat Mechanism and Its Application in Gigascope.
VLDB 2005: 1079-1088.
- Utkarsh Srivastava,
Jennifer Widom: Flexible Time Management in Data
Stream Systems. PODS 2004: 263-274
- Jin Li, David Maier, Kristin Tufte,
Vassilis Papadimos, Peter
A. Tucker: Semantics and Evaluation Techniques for Window Aggregates in Data
Streams. SIGMOD Conference 2005: 311-322.
[Approximate Query Answering on Data Streams]
- Swarup Acharya, Phillip
B. Gibbons, Viswanath Poosala,Sridhar
Ramaswamy: Join Synopses for Approximate Query Answering.
SIGMOD1999, pp.275--286. Abhinandan Das,
Johannes Gehrke, Mirek Riedewald: Approximate
Join Processing Over Data Streams.SIGMOD2003, pp.40--51.
- Yan-Nei Law, and C. Zaniolo, Load Shedding for
Window Joins on Multiple Data Streams. First International Workshop on Scalable
Stream Processing Systems (SSPS'07) April
16-20, 2007, Istanbul, Turkey.
- A Robust, Optimization-Based Approach for Approximate Answering of Aggregate
Queries. By Surajit Chaudhuri, Gautam Das, Vivek Narasayya ACM
- On Computing Correlated Aggregates Over Continual
Data Streams. By Johannes Gehrke (Cornell
Univ.), Flip Korn, and Divesh Srivastava.
- Space-Efficient Online Computation of Quantile
Summaries. By Michael Greenwald and Sanjeev Khanna
(Univ. of Pennsylvania).
SIGMOD/PODS 2001
- Alin Dobra, Minos N.
Garofalakis, Johannes Gehrke,
Rajeev Rastogi: Processing
complex aggregate queries over data streams. SIGMOD2002, pp.61--72.
- Arvind Arasu, Gurmeet
Singh Manku. Approximate Counts and Quantiles
over Sliding Windows. In the ACM Symposium on Principles of Database Systems
(PODS), 2004.
- Brian Babcock, Chris Olston. Distributed Top-k Monitoring. In the ACM International
Conference on Management of Data (SIGMOD) 2003.
- Brian Babcock, Mayur Datar, Rajeev Motwani, LiadanO O'Callaghan.
Maintaining Variance and k-Medians over Data Stream Windows. In the ACM Symposium
on Principles of Database Systems (PODS) 2003.
- Jeffrey Considine, Feifei
Li, George Kollios, John W. Byers:Approximate
Aggregation Techniques for Sensor Databases. ICDE 2004.
- Tao Li, Qi Li, Shenghuo
Zhu, Mitsunori Ogihara:
A Survey on Wavelet Applications in Data Mining. SIGKDD Explorations 2002
4(2), pp.49--68.
- Minos N. Garofalakis,
Phillip B. Gibbons: Wavelet synopses with error guarantees. SIGMOD 2002, pp.476--487.
- Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, Martin Strauss: Surfing Wavelets on Streams:
One-Pass Summaries for Approximate Aggregate Queries. VLDB2001, pp.79--88.
- Kaushik Chakrabarti,
Minos N. Garofalakis, Rajeev Rastogi,
Kyuseok Shim: Approximate Query Processing Using
Wavelets. VLDB2000, pp.111--122.
[Scheduling, Load Shedding]
1.
Stratis D. Viglas, Jeffrey F. Naughton: Rate-Based
Query Optimization for Streaming Information. SIGMOD 2002, 37-48.
2.
Donald Carney, Ugur Çetintemel, Alex Rasin, Stanley
B. Zdonik, Mitch Cherniack,
Michael Stonebraker: Operator Scheduling in a Data
Stream Manager. VLDB 2003: 838-849.
3.
B. Babcock, S. Babu, M. Datar,
and R. Motwani. Chain: Operator Scheduling for Memory
Minimization in Data Stream Systems To appear in Proc. of the ACM Intl Conf. on Management of Data
(SIGMOD 2003), June 2003.
4.
Yijian Bai and Carlo Zaniolo: Minimizing Latency and Memory in DSMS:
a Unified Approach to Quasi-Optimal Scheduling. The Second International Workshop
on Scalable Stream Processing Systems, March 29, 2008, Nantes,
France.
5.
Brian Babcock, Mayur Datar, Rajeev Motwani: Load Shedding
for Aggregation Queries over Data Streams. ICDE2004, pp.350--361.
6.
Yan-Nei Law and Carlo Zaniolo: Improving
the Accuracy of Continuous Aggregates and Mining Queries on Data Streams under
Load Shedding. International Journal of Business Intelligence and Data Mining,
2008.
7.
Nesime Tatbul, Ugur Cetintemel,
Stanley B. Zdonik, Mitch Cherniack,
Michael Stonebraker: Load Shedding in a Data Stream
Manager.VLDB2003, pp.309--320.
8.
Nesime Tatbul, Ugur Çetintemel,
Stanley B. Zdonik:
Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing.
VLDB 2007: 159-170.
[Processing of Streaming XML documents]
- M. Altinel and M. J. Franklin. “Efficient Filtering
of XML Documents for Selective Dissemination of Information”. In Proc. Of
VLDB, 2000. [Xfilter]
- C.-Y. Chan, P. Felber, M. Garofalakis, and
R. Rastogi. “Efficient Filtering of XML Documents
with XPath Expressions”. In Proc. of ICDE, 2002.
- Z. G. Ives, A. Y. Halevy, D. S. Weld. “An XML Query Engine for Network-Bound
Data”. In VLDB Journal, 2002.
- J. Chen, D. J. Dewitt, F. Tian, Y. Wang. “NiagaraCQ: a scalable continuous query system for internet
databases”. In Proc. Of SIGMOD, 2002.
- C. Barton, P. Charles, D. Goyal, M. Raghavachari, M.
Fontoura, and V. Josifovski.
“Streaming XPath Processing with Forward and Backward
Axes”. In Proc. of ICDE, 2003.
- Y. Diao, M. Altinel,
M. Franklin, et al. Path Sharing and Predicate Evaluation for High-Performance
XML Filtering. In TODS, pages 467–516, 2003.
- Xin Zhou, Hetal
Thakkar and Carlo Zaniolo: Unifying the Processing
of XML Streams and Relational Data Streams, ICDE 2006.
Data Mining Query Languages and Systems
- Tomasz Imielinski and
Heikki Mannila. A database perspective on knowledge discovery.
Communication ACM, 39(11):58, 1996.
- S. Sarawagi, S. Thomas, and R. Agrawal.
Integrating association rule mining with relational database systems: Alternatives
and implications. In SIGMOD, 1998.
- T. Imielinski and A. Virmani.
MSQL: a query language for database mining. Data Mining and Knowledge Discovery,
3:373--408, 1999.
- J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL:
A data mining query language for relational databases. In Workshop on Research
Issues on Data Mining and Knowledge Discovery (DMKD), pages 27--33, Montreal,
Canada, June 1996.
- R. Meo, G. Psaila,
and S. Ceri. A new SQL-like operator for mining
association rules. In VLDB, pages 122--133, Bombay,
India, 1996.
- Marco Botta, Jean-Francois Boulicaut,
Cyrille Masson, and Rosa Meo.
Query languages supporting descriptive rule mining: A comparative study. In
Database Support for Data Mining Applications, pages 24--51, 2004.
- Carlo Zaniolo: Mining Databases and Data Streamswith
Query Languages and Rules: Invited Talk, Fourth International Workshop on
Knowledge Discovery in Inductive Databases, KDID 2005.
- ORACLE. Oracle Data Miner Release 10gr2: http://www.oracle.com/technology/products/bi/odm.
- Data Mining Group (DMG). Predictive model markup language (pmml).
http://sourceforge.net/projects/pmml.
- Z. Tang, J. Maclennan, and P. Kim. Building data mining solutions
with OLE DB for DM and XML analysis. SIGMOD Record, 34(2):80–85, 2005.
Hetal Thakkar Mozafari and Carlo Zaniolo:Designing an Inductive Data Stream
Management System: the Stream Mill Experience. The Second International Workshop
on Scalable Stream Processing Systems, March 29, 2008, Nantes, France.
Mining Data Bases and Data Streams
Clustering
[Book] G. J. McLachlan and K.E. Bkasford. Mixture Models: Inference and Applications
to Clustering. John Wiley and Sons, 1988.
[Book] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction
to Cluster Analysis. John Wiley & Sons, 1990.
[CLARANS] R. Ng and J. Han. Efficient and effective clustering method for spatial
data mining. VLDB'94.
[CLIQUE] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace
clustering of high dimensional data for data mining applications. SIGMOD'98
[OPTICS] M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering
points to identify the clustering structure, SIGMOD99.
[Text] Beil F., Ester M., Xu X.: "Frequent Term-Based Text Clustering",
KDD'02
[Outliers] M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying
Density-Based Local Outliers. SIGMOD 2000.
[DBSCAN] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm
for discovering clusters in large spatial databases. KDD'96.
[Categorical] D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical
data: An approach based on dynamic systems. VLDB98.
[Categorical] V. Ganti, J. Gehrke, R. Ramakrishan. CACTUS Clustering Categorical
Data Using Summaries. KDD'99.
[CURE] S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm
for large databases. SIGMOD'98.
[ROCK] S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm
for categorical attributes. In ICDE'99, pp. 512-521, Sydney, Australia, March
1999.
[Hierarchical] G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A Hierarchical
Clustering Algorithm Using Dynamic Modeling. COMPUTER, 32(8): 68-75, 1999.
[Outliers] E. Knorr and R. Ng. Algorithms for mining distance-based outliers
in large datasets. VLDB98.
[DENCLUE] A. Hinneburg, D.l A. Keim: An Efficient Approach to Clustering in
Large Multimedia Databases with Noise. KDD98
[Wavelets] G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution
clustering approach for very large spatial databases. VLDB98.
[Constraints] A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based
Clustering in Large Databases, ICDT'01.
[p-cluster] H. Wang, W. Wang, J. Yang, and P.S. Yu. Clustering by pattern
similarity in large data sets, SIGMOD 02.
[STING] W.. Wang, Yang, R. Muntz, STING: A Statistical Information grid Approach
to Spatial Data Mining, VLDB97.
[BIRCH] T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient data clustering
method for very large databases. SIGMOD'96.
[Data Stream Clustering]
- Liadan O'Callaghan, Adam Meyerson, Rajeev Motwani, Nina Mishra, Sudipto
Guha: Streaming-Data Algorithms for High-Quality Clustering. ICDE 2002: 685+
- Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, Liadan O'Callaghan:
Clustering Data Streams: Theory and Practice. IEEE Trans. Knowl. Data Eng.
15(3): 515-528 (2003)
- C. Aggarwal, J. Han, J. Wang, P. S. Yu. A Framework for Clustering Data
Streams, VLDB'03
- C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A Framework for Projected Clustering
of High Dimensional Data Streams, VLDB'04.
[Association Rule Mining]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between
sets of items in large databases. SIGMOD'93.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94
J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for
mining association rules. SIGMOD'95.
A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations
in a large database of customer transactions. ICDE'98.
D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov.
Query flocks: A generalization of association-rule mining. SIGMOD'98.
H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in
event sequences. DAMI:97.
M. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine
Learning:01.
(Max-pattern) R. J. Bayardo. Efficiently mining long patterns from databases.
SIGMOD'98.
(Closed-pattern) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering
frequent closed itemsets for association rules. ICDT'99.
(FP-Growth) J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate
generation. SIGMOD 00.
J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic
Projection. KDD'02
Gösta Grahne, Jianfei Zhu: Efficiently Using Prefix-trees in Mining Frequent
Itemsets. FIMI 2003
Zaki and Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining, SDM'02.
R. Srikant and R. Agrawal. Mining generalized association rules. VLDB'95.
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases.
VLDB'95.
B. Lent, A. Swami, and J. Widom. Clustering association rules. ICDE'97.
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding
interesting rules from large sets of discovered association rules. CIKM'94.
S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing
association rules to correlations. SIGMOD'97.
C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for
mining causal structures. VLDB'98.
P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness
Measure for Association Patterns. KDD'02.
E. Omiecinski. Alternative Interest Measures for Mining Associations. TKDE03.
Y. K. Lee, W.Y. Kim, Y. D. Cai, and J. Han. CoMine: Efficient Mining of Correlated
Patterns. ICDM03.
[Association on Data Streams]
- G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams,
VLDB02
- Richard M. Karp, Scott Shenker, Christos H. Papadimitriou: A simple algorithm
for finding frequent elements in streams and bags. ACM Trans. Database Syst.
28: 51-55 (2003)
- C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu. Mining frequent patterns
in data streams at multiple time granularities, Kargupta, et al. (eds.), Next
Generation Data Mining04
- Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi: Efficient Computation
of Frequent and Top-k Elements in Data Streams. ICDT 2005: 398-412
[Classification]
- T.-S. Lim, W.-Y. Loh, and Y.-S. Shih. A comparison of prediction accuracy,
complexity, and training time of thirty-three old and new classification algorithms.
Machine Learning, 2000.
- J. Magidson. The Chaid approach to segmentation modeling: Chi-squared automatic
interaction detection. In R. P. Bagozzi, editor, Advanced Methods of Marketing
Research, Blackwell Business, 1994.
- M. Mehta, R. Agrawal, and J. Rissanen. SLIQ : A fast scalable classifier
for data mining. EDBT'96.
- J. R. Quinlan. Bagging, boosting, and c4.5. AAAI'96.
- R. Rastogi and K. Shim. Public: A decision tree classifier that integrates
building and pruning. VLDB98.
- J. Shafer, R. Agrawal, and M. Mehta. SPRINT : A scalable parallel classifier
for data mining. VLDB96.
- H. Yu, J. Yang, and J. Han. Classifying large data sets using SVM with hierarchical
clusters. KDD'03.
- J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest: A framework for fast
decision tree construction of large datasets. VLDB98.
- X. Yin and J. Han. CPAR: Classification based on predictive association
rules. SDM'03..
- L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression
Trees. Wadsworth International Group, 1984.
- Haixun Wang, Carlo Zaniolo: CMP: A Fast Decision Tree Classifier Using Multivariate
Predictions. ICDE 2000: 449-460.
[Classification on Data Streams]
- P. Domingos and G. Hulten, Mining high-speed data streams, KDD'00
- C. C. Aggarwal, J. Han, J. Wang and P. S. Yu. On-Demand Classification of
Evolving Data Streams, KDD'04
- Fang Chu, Carlo Zaniolo: Fast and Light Boosting for Adaptive Mining of
Data Streams. PAKDD 2004: 282-292.
- Fang Chu, Yizhou Wang, Carlo Zaniolo: An Adaptive Learning Approach for
Noisy Data Streams. ICDM 2004: 351-354.
- C. C. Aggarwal, J. Han, J. Wang and P. S. Yu. On-Demand Classification of
Evolving Data Streams, KDD'04
- Yan-Nei Law, Carlo Zaniolo: An Adaptive Nearest Neighbor Classification
Algorithm for Data Streams. PKDD 2005: 108-120.
[Time Series]
- C. Chatfield. The Analysis of Time Series: An Introduction, 3rd ed. Chapman
& Hall, 1984.
- R.H. Shumway & D.S. Stoffer. Time Series Analysis and Its Applications:
With R Examples (2nd ed.), Springer Texts in Statistics, 2006. http://www.stat.pitt.edu/stoffer/tsa2/index.html
- StatSoft. Electronic Textbook. www.statsoft.com/textbook/stathome.html
- R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence
databases. FODO93 (Foundations of Data Organization and Algorithms).
- R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast similarity search
in the presence of noise, scaling, and translation in time-series databases.
VLDB'95.
- R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying shapes of histories.
VLDB'95.
- C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching
in time-series databases. SIGMOD'94.
Carlo Zaniolo,Stefano Ceri,Christos Faloutsos, Richard T. Snodgrass,VS Subrahmanian,
Roberto Zicari. Advanced Database Systems (Chater 12), Morgan-Kaufmann, 1997.
- Nasser Yazdani, Z. Meral Özsoyoglu: Sequence Matching of Images. SSDBM
1996: 53-62.
- Y. Moon, K. Whang, W. Loh. Duality Based Subsequence Matching in Time-Series
Databases, ICDE02
- B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar
time sequences under time warping. ICDE'98.
- B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, and
A. Biliris. Online data mining for co-evolving time sequences. ICDE'00.
- Dennis Shasha and Yunyue Zhu. High Performance Discovery in Time Series:
Techniques and Case Studies, SPRINGER, 2004
- L. R. Rabiner. A tutorial on hidden markov models and selected applications
in speech recognition. Proc. IEEE, 77:257--286, 1989.
- R.Durbin, S.Eddy, A.Krogh and G.Mitchison. Biological Sequence Analysis:
Probability Models of Proteins and Nucleic Acids. Cambridge University Press,
1998.