240B: Advanced Data and Knowledge Bases:

Sample of topics for presentations and projects


Event Processing Languages and Systems

[Query languages for complex events. 1--3: early proposals. 4 language+optimization. 5+6 proposed standards+blog]

  1. Seshadri, P., Linvy, M., And Ramakrishnan, R. 1994. In Proceedings of ACM SIGMOD Conference on Management of Data. ACM, New York, 430–441.
  2. Seshadri, P., Livny, M., And Ramakrishnan, R. 1995. SEQ: A model for sequence databases. In
    ICDE. 232–239.
  3. Ramakrishnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K., and Krishnaprasad,
    SRQL: Sorted relational query language. In Proceedings of the 10th Annual International Conference
    on Scientific and Statistical Database Management (Capri, Italy, July 1–3), 1988, 84–95.
  4. Reza Sadri, Carlo Zaniolo, Amir Zarkesh, Jafar Adibi: Expressing and optimizing sequence queries in database systems. ACM Transactions on Database Systems (TODS) Volume 29 , Issue 2 (June 2004).
  5. Fred Zemke, Andrew Witkowski, Mitch Cherniak, Latha Colby, Pattern matching in sequences of row, ANSI change proposal, March 27, http://www.cs.ucla.edu/classes/spring07/cs240B/notes/row-pattern-recogniton-11.pdf.
  6. Discussion Blog for above: http://tkyte.blogspot.com/2007/04/so-in-your-opinion.html

[Event Processing using WebSpere]

IBM Redbooks | WebSphere Business Integration Adapter Development , http://www.redbooks.ibm.com/abstracts/redp9119.html?Open

Ana Biazetti and Kim Gadja: Achieving complex event processing with Active Correlation Technology--Rule your domains with rules to trigger automated processes.http://www.ibm.com/developerworks/autonomic/library/ac-acact/index.html

[Event Processing using Java Message Service]

Sun's official JMS site includes documentation, FAQs and a JMS vendor list. java.sun.com/products/jms/

[Pub/Sub]
Patrick Th. Eugster et al.: The many faces of publish/subscribe. CM Computing Surveys (CSUR) archive
Volume 35 , Issue 2 (June 2003), 114 - 131.

Data Streams

[Overviews]
B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom: Models and Issues in Data Stream
Systems. PODS 2002: 1-16

Lukasz Golab and M. Tamer ¨Ozsu. Issues in data stream management. ACM SIGMOD Record, 32(2):5–14, 2003.

[Language and Systems]

[Windows, Operators and Timestamps]

 

[Approximate Query Answering on Data Streams]

[Scheduling, Load Shedding, and Distributed Processing]

[Processing of Streaming XML documents]

Data Mining Systems

 Mining Data Bases and Data Streams

Clustering

[Book] G. J. McLachlan and K.E. Bkasford. Mixture Models: Inference and Applications to Clustering. John Wiley and Sons, 1988.

[Book] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.

[CLARANS] R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. VLDB'94.

[CLIQUE] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD'98

[OPTICS] M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure, SIGMOD’99.

[Text] Beil F., Ester M., Xu X.: "Frequent Term-Based Text Clustering", KDD'02

[Outliers] M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. SIGMOD 2000.

[DBSCAN] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. KDD'96.

[Categorical] D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamic systems. VLDB’98.

[Categorical] V. Ganti, J. Gehrke, R. Ramakrishan. CACTUS Clustering Categorical Data Using Summaries. KDD'99.

[CURE] S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. SIGMOD'98.

[ROCK] S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm for categorical attributes. In ICDE'99, pp. 512-521, Sydney, Australia, March 1999.

[Hierarchical] G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER, 32(8): 68-75, 1999.

[Outliers] E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. VLDB’98.

[DENCLUE] A. Hinneburg, D.l A. Keim: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. KDD’98

[Wavelets] G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution clustering approach for very large spatial databases. VLDB’98.

[Constraints] A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based Clustering in Large Databases, ICDT'01.

[p-cluster] H. Wang, W. Wang, J. Yang, and P.S. Yu.  Clustering by pattern similarity in large data sets,  SIGMOD’ 02.

[STING] W.. Wang, Yang, R. Muntz, STING: A Statistical Information grid Approach to Spatial Data Mining, VLDB’97.

[BIRCH] T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient data clustering method for very large databases. SIGMOD'96.

[Data Stream Clustering]

[Association Rule Mining]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD'93.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94

J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD'95.

A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transactions. ICDE'98.

D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks: A generalization of association-rule mining. SIGMOD'98.

H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. DAMI:97.

M. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning:01.

(Max-pattern) R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98.

(Closed-pattern) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. ICDT'99.

(FP-Growth) J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. SIGMOD’ 00.

J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic Projection. KDD'02

Gösta Grahne, Jianfei Zhu: Efficiently Using Prefix-trees in Mining Frequent Itemsets. FIMI 2003

Zaki and Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining, SDM'02.

R. Srikant and R. Agrawal. Mining generalized association rules. VLDB'95.

J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. VLDB'95.

B. Lent, A. Swami, and J. Widom. Clustering association rules. ICDE'97.

M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. CIKM'94.

S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. SIGMOD'97.

C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. VLDB'98.

P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness Measure for Association Patterns. KDD'02.

E. Omiecinski. Alternative Interest Measures for Mining Associations. TKDE’03.

Y. K. Lee, W.Y. Kim, Y. D. Cai, and J. Han. CoMine: Efficient Mining of Correlated Patterns. ICDM’03.

[Association on Data Streams]

[Classification]

[Time Series]

C. Chatfield. The Analysis of Time Series: An Introduction, 3rd ed. Chapman & Hall, 1984.

R.H. Shumway & D.S. Stoffer. Time Series Analysis and Its Applications: With R Examples (2nd ed.), Springer Texts in Statistics, 2006. http://www.stat.pitt.edu/stoffer/tsa2/index.html

StatSoft. Electronic Textbook. www.statsoft.com/textbook/stathome.html

R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. FODO’93 (Foundations of Data Organization and Algorithms).

R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. VLDB'95.

R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying shapes of histories. VLDB'95.

C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. SIGMOD'94.

Carlo Zaniolo,Stefano Ceri,Christos Faloutsos, Richard T. Snodgrass,VS Subrahmanian, Roberto Zicari. Advanced Database Systems (Chater 12), Morgan-Kaufmann, 1997.

Nasser Yazdani, Z. Meral Özsoyoglu: Sequence Matching of Images. SSDBM 1996: 53-62

Y. Moon, K. Whang, W. Loh. Duality Based Subsequence Matching in Time-Series Databases, ICDE’02

B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. ICDE'98.

B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. ICDE'00.

Dennis Shasha and Yunyue Zhu. High Performance Discovery in Time Series: Techniques and Case Studies, SPRINGER, 2004

L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE, 77:257--286, 1989.

R.Durbin, S.Eddy, A.Krogh and G.Mitchison. Biological Sequence Analysis: Probability Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.