240B:
Sample of topics for presentations and projects
DSMS:
Overviews (Your presentation and projects cannot be on these
overview papers, which however can help you in your selection and
preparation)
- B.
Babcock, S. Babu, M. Datar, R. Motwani, J. Widom: Models and Issues in
Data Stream
Systems. PODS 2002: 1-16
- Lukasz
Golab and M. Tamer ¨Ozsu. Issues in data stream management. ACM SIGMOD
Record, 32(2):5–14, 2003.
-
Gianpaolo
Cugola, Alessandro
Margara: Processing
flows of information: From data stream to complex event processing. ACM
Comput. Surv. 44(3):
15 (2012).
Languages
- Arvind
Arasu,
Shivnath Babu, Jennifer Widom: The CQL continuous query language:
semantic foundations and query execution. VLDB J. 15(2): 121-142 (2006).
- Yijian
Bai,
Hetal Thakkar, Chang Luo, Haixun Wang, Carlo Zaniolo: A Data Stream
Language and System Designed for Power and Extensibility. Proc. of the
ACM 15th Conference on Information and Knowledge Management (CIKM'06),
2006.
-----
- Yan-Nei
Law,
Haixun Wang, Carlo Zaniolo: Query Languages and Data Models for Database
Sequences and Data Streams. VLDB 2004. 492-503.
- Yan-Nei
Law, Haixun Wang, Carlo Zaniolo: Relational
Languages and Data Models for Continuous Queries on Sequences and
Data Streams.
ACM Transactions on Database Systems, Volume 36, Issue 2 (20011).
- Carlo
Zaniolo,
Logical Foundations of Continuous Query Languages for Data Streams. Datalog
2012:177-189
Pattern Languages
- Reza
Sadri,
Carlo Zaniolo, Amir Zarkesh, Jafar Adibi: Expressing and optimizing
sequence queries in database systems. ACM Transactions on Database
Systems (TODS) Volume 29 , Issue 2 (June 2004).
- Fred
Zemke,
Andrew Witkowski, Mitch Cherniak, Latha Colby, Pattern matching in
sequences of row, ANSI change proposal, March 27,
http://www.cs.ucla.edu/classes/spring07/cs240B/notes/row-pattern-recogniton-11.pdf.
- Discussion
Blog for above:
http://tkyte.blogspot.com/2007/04/so-in-your-opinion.html
- Jagrati
Agrawal,
Yanlei Diao, Daniel Gyllstrom, Neil Immerman:Efficient pattern matching
over event streams. Proceedings of the ACM SIGMOD International
Conference on Management of Data, SIGMOD 2008,
- D.
Gyllstrom, J. Agrawal, Y. Diao, and N. Immerman. On supporting Kleene
closure over event streams. In ICDE, 2008.
- Barzan
Mozafari, Kai Zeng, Carlo Zaniolo: From Regular Expressions to Nested
Words:
Unifying Languages and Query Execution for Relational and XML
Sequences. PVLDB 3(1): 150-161 (2010).
- Barzan
Mozafari, Kai Zeng, Carlo Zaniolo: High-Performance
Complex Event Processing
over XML Streams. ACM SIGMOD 2012.
Windows,
Operators and Timestamps
- Arvind
Arasu, Jennifer Widom: Resource Sharing in Continuous Sliding-Window
Aggregates. VLDB 2004.
- Utkarsh
Srivastava, Jennifer Widom: Memory-Limited Execution of Windowed Stream
Joins. VLDB 2004: 324-335
- Yijian
Bai,
Hetal Thakkar, Chang Luo, Haixun Wang, Carlo Zaniolo: A Data Stream
Language and System Designed for Power and Extensibility. Proc. of the
ACM 15th Conference on Information and Knowledge Management (CIKM'06),
2006.
- Yijian
Bai et al., Optimizing Timestamp Management in Data Stream Management
Systems, ICDE 2007.
- Y.
Bai,
H. Thakkar, H. Wang, and C. Zaniolo. Time-stamp Management and Query
Execution in Data Stream Management Systems. IEEE Internet Computing,
12(6):13{21, 2008.
- Theodore
Johnson,
S. Muthukrishnan, Vladislav Shkapenyuk, Oliver Spatscheck: A Heartbeat
Mechanism and Its Application in Gigascope. VLDB 2005: 1079-1088.
- Utkarsh
Srivastava, Jennifer Widom: Flexible Time Management in Data Stream
Systems. PODS 2004: 263-274
- Jin
Li,
David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker:
Semantics and Evaluation Techniques for Window Aggregates in Data
Streams. SIGMOD Conference 2005: 311-322.
- Lukasz
Golab,
Theodore Johnson, J. Spencer Seidel, Vladislav Shkapenyuk: Stream
warehousing with DataDepot. SIGMOD Conference 2009: 847-854.
- Jens
Teubner*, ETH Zurich; Rene Mueller, IBM Almaden: How Soccer Players
Would Do Stream Joins. ACM SIGMOD 2011.
-
Jin
Li, Kristin Tufte, Vladislav Shkapenyuk, Vassilis Papadimos, Theodore
Johnson, Out-of-order processing: a new architecture for
high-performance stream systems. PVLDB
1(1):
274-288 (2008).
- Lukasz
Golab, Theodore Johnson, Vladislav Shkapenyuk: Scalable Scheduling of
Updates in Streaming Data Warehouses. IEEE Trans. Knowl. Data Eng.
24(6): 1092-1105 (2012)
Approximate
Query Answering on Data Streams
- Swarup
Acharya,
Phillip B. Gibbons, Viswanath Poosala,Sridhar Ramaswamy: Join Synopses
for Approximate Query Answering. SIGMOD1999, pp.275--286.
- Kaushik
Chakrabarti,
Minos N. Garofalakis, Rajeev Rastogi, Kyuseok Shim: Approximate Query
Processing Using Wavelets. VLDB2000, pp.111--122.
- Anna
C.
Gilbert, Yannis Kotidis, S. Muthukrishnan, Martin Strauss: Surfing
Wavelets on Streams: One-Pass Summaries for Approximate Aggregate
Queries. VLDB2001, pp.79--88.
- Abhinandan
Das, Johannes Gehrke, Mirek Riedewald: Approximate Join Processing Over
Data Streams.SIGMOD2003, pp.40--51.
- Yan-Nei
Law,
and C. Zaniolo, Load Shedding for Window Joins on Multiple Data Streams.
First International Workshop on Scalable Stream Processing Systems
(SSPS'07) April 16-20, 2007, Istanbul, Turkey.
- Space-Efficient
Online Computation of Quantile Summaries. By Michael Greenwald and
Sanjeev Khanna. SIGMOD/PODS 2001
- Alin
Dobra,
Minos N. Garofalakis, Johannes Gehrke, Rajeev Rastogi: Processing
complex aggregate queries over data streams. SIGMOD 2002, pp.61--72.
- Arvind
Arasu,
Gurmeet Singh Manku. Approximate Counts and Quantiles over Sliding
Windows. In the ACM Symposium on Principles of Database Systems (PODS),
2004.
- Brian
Babcock,
Chris Olston. Distributed Top-k Monitoring. In the ACM International
Conference on Management of Data (SIGMOD) 2003.
- Brian
Babcock,
Mayur Datar, Rajeev Motwani, LiadanO O'Callaghan. Maintaining Variance
and k-Medians over Data Stream Windows. In the ACM Symposium on
Principles of Database Systems (PODS) 2003.
- Jeffrey
Considine, Feifei Li, George Kollios, John W. Byers:Approximate
Aggregation Techniques for Sensor Databases. ICDE 2004.
- Tao
Li,
Qi Li, Shenghuo Zhu, Mitsunori Ogihara: A Survey on Wavelet Applications
in Data Mining. SIGKDD Explorations 2002 4(2), pp.49--68.
- Minos
N. Garofalakis, Phillip B. Gibbons: Wavelet synopses with error
guarantees. SIGMOD 2002, pp.476--487.
Execution,
Scheduling, Optimization
- Stratis
D. Viglas, Jeffrey F. Naughton: Rate-Based Query Optimization for
Streaming Information. SIGMOD 2002, 37-48.
- Donald
Carney,
Ugur Çetintemel, Alex Rasin, Stanley B. Zdonik, Mitch Cherniack, Michael
Stonebraker: Operator Scheduling in a Data Stream Manager. VLDB 2003:
838-849.
-
B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator
Scheduling for Memory Minimization in Data Stream Systems To appear in
Proc. of the ACM Intl Conf. on Management of Data (SIGMOD 2003), June
2003.
- Yijian
Bai
and Carlo Zaniolo: Minimizing Latency and Memory in DSMS: a Unified
Approach to Quasi-Optimal Scheduling. The Second International Workshop
on Scalable Stream Processing Systems, March 29, 2008, Nantes, France.
- Yijian
Bai,
Hetal Thakkar, Haixun Wang, Carlo Zaniolo:A Flexible Query Graph Based
Model for the Efficient Execution of Continuous Queries. The First
International Workshop on Scalable Stream Processing Systems (SSPS'07),
April 16-20, 2007, Istanbul, Turkey.
- Henrique
Andrade, Bugra Gedik, Kun-Lung Wu, Philip S. Yu: Scale-Up
Strategies for Processing High-Rate Data Streams in System S. ICDE
2009:1375-1378
Load
Shedding, Sampling
- Brian
Babcock, Mayur Datar, Rajeev Motwani: Load Shedding for Aggregation
Queries over Data Streams. ICDE2004, pp.350--361.
- Nesime
Tatbul,
Ugur Cetintemel, Stanley B. Zdonik, Mitch Cherniack, Michael
Stonebraker: Load Shedding in a Data Stream Manager.VLDB2003,
pp.309--320.
- Nesime
Tatbul,
Ugur Çetintemel, Stanley B. Zdonik: Staying FIT: Efficient Load Shedding
Techniques for Distributed Stream Processing. VLDB 2007: 159-170.
- Yan-Nei
Law
and Carlo Zaniolo: Improving the Accuracy of Continuous Aggregates and
Mining Queries on Data Streams under Load Shedding. International
Journal of Business Intelligence and Data Mining, 2008.
- V.
Braverman, R. Ostrosky and C. Zaniolo. Optimal Sampling from Sliding
Windows, PODS 2009.
- Peixiang
Zhao, Charu C. Aggarwal, Min Wang:gSketch:
On Query Estimation in Graph Streams. VLDB 2011 193-204
- Synopsis
Maintenance
- Distributed
TopK Monitoring, by Brian Babcock, Chris
Olston, in the ACM International Conference on Management of Data
(SIGMOD) 2003.
- Maintaining
Stream Statistics over Sliding Windows, by
Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani, in the
ACM-SIAM Symposium on Discrete Algorithms (SODA) 2002.
- Maintaining
Variance and k-Medians over Data Stream Windows,
by Brian Babcock, Mayur Datar, Rajeev Motwani, LiadanO O'Callaghan, in
the ACM Symposium on Principles of Database Systems (PODS) 2003.
- StatStream:
Statistical Monitoring of Thousands of Data Streams in Real Time,
by Yunyue Zhu, Dennis Shasha, in the International Conference on Very
Large Data Bases (VLDB) 2002.
- Mining
A Stream of Transactions for Customer Patterns,
by Diane Lambert, Jose C. Pinheiro, in the ACM International Conference
on Knowledge Discovery and Data Mining (SIGKDD) 2001.
- Approximate
Medians and other Quantiles in One Pass and with Limited Memory,
by Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay, in the
ACM International Conference on Management of Data (SIGMOD) 1998.
- Random
Sampling Techniques for Space Efficient Online Computation of
Order Statistics of Large Datasets, by
Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay, in the ACM
International Conference on Management of Data (SIGMOD) 1999.
- Synopsis
Data Structures for Massive Data Sets, by
Phillip B. Gibbons, Yossi Matias, in the ACM-SIAM Symposium on Discrete
Algorithms (SODA) 1999.
Processing
of Streaming XML documents
- M.
Altinel
and M. J. Franklin. “Efficient Filtering of XML Documents for Selective
Dissemination of Information”. In Proc. Of VLDB, 2000. [Xfilter]
- C.-Y.
Chan,
P. Felber, M. Garofalakis, and R. Rastogi. “Efficient Filtering of XML
Documents with XPath Expressions”. In Proc. of ICDE, 2002.
- Z.
G. Ives, A. Y. Halevy, D. S. Weld. “An XML Query Engine for
Network-Bound Data”. In VLDB Journal, 2002.
- J.
Chen,
D. J. Dewitt, F. Tian, Y. Wang. “NiagaraCQ: a scalable continuous query
system for internet databases”. In Proc. Of SIGMOD, 2002.
- C.
Barton,
P. Charles, D. Goyal, M. Raghavachari, M. Fontoura, and V. Josifovski.
“Streaming XPath Processing with Forward and Backward Axes”. In Proc. of
ICDE, 2003.
- Y.
Diao,
M. Altinel, M. Franklin, et al. Path Sharing and Predicate Evaluation
for High-Performance XML Filtering. In TODS, pages 467–516, 2003.
- Xin
Zhou, Hetal Thakkar and Carlo Zaniolo: Unifying the Processing of XML
Streams and Relational Data Streams, ICDE 2006.
- Barzan
Mozafari, Kai Zeng, Carlo Zaniolo: From Regular Expressions to Nested
Words:
Unifying Languages and Query Execution for Relational and XML
Sequences. PVLDB 3(1): 150-161 (2010)
Complex
Event Processing (CEP)
- [Event
Processing using WebSpere]
IBM Redbooks | WebSphere Business Integration Adapter Development ,
http://www.redbooks.ibm.com/abstracts/redp9119.html?oppen
- Ana
Biazetti
and Kim Gadja: Achieving complex event processing with Active
Correlation Technology--Rule your domains with rules to trigger
automated
processes.http://www.ibm.com/developerworks/autonomic/library/ac-acact/index.html
- [Event
Processing using Java Message Service]
- Sun's
official JMS site includes documentation, FAQs and a JMS vendor list.
java.sun.com/products/jms/
- [Pub/Sub]
Patrick Th. Eugster et al.: The many faces of publish/subscribe. CM
Computing Surveys (CSUR) archive
Volume 35 , Issue 2 (June 2003), 114 - 131.
DSMS/CEP
Applications
1. Cranor, Johnson, Spatscheck & Shkapenyuk. Gigascope: A
Stream Database for Network Applications. SIGMOD 2003
2. Joseph M. Hellerstein. From Database to Dataflow: New Directions in IT.
Medical Records Institute Health IT Advisory Report 3(6) (2002).
3. Lerner & Shasha. The Virtues and Challenges of Ad Hoc + Streams
Querying in Finance. IEEE Data Engineering Bulletin, March 2003.
4. Sistal, Wolfson, Chamberlain, Dao. Modeling and Querying
Moving Objects. ICDE 1997.
5. Yao & Gehrke. Query Processing for Sensor Networks.
CIDR 2003.
6. Di Wang, et al. Active Complex
Event Processing over Event Streams: VLDB 2011.
Data
Mining Query Languages for DBMS and DSMS
- Tomasz
Imielinski and Heikki Mannila. A database perspective on knowledge
discovery. Communication ACM, 39(11):58, 1996.
- S.
Sarawagi,
S. Thomas, and R. Agrawal. Integrating association rule mining with
relational database systems: Alternatives and implications. In SIGMOD,
1998.
- T.
Imielinski and A. Virmani. MSQL: a query language for database mining.
Data Mining and Knowledge Discovery, 3:373--408, 1999.
- J.
Han,
Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A data mining query
language for relational databases. In Workshop on Research Issues on
Data Mining and Knowledge Discovery (DMKD), pages 27--33, Montreal,
Canada, June 1996.
- R.
Meo,
G. Psaila, and S. Ceri. A new SQL-like operator for mining association
rules. In VLDB, pages 122--133, Bombay, India, 1996.
- Marco
Botta,
Jean-Francois Boulicaut, Cyrille Masson, and Rosa Meo. Query languages
supporting descriptive rule mining: A comparative study. In Database
Support for Data Mining Applications, pages 24--51, 2004.
- Carlo
Zaniolo:
Mining Databases and Data Streamswith Query Languages and Rules: Invited
Talk, Fourth International Workshop on Knowledge Discovery in Inductive
Databases, KDID 2005.
---
- ORACLE.
Oracle Data Miner Release 10gr2:
http://www.oracle.com/technology/products/bi/odm.
- Data
Mining Group (DMG). Predictive model markup language (pmml).
http://sourceforge.net/projects/pmml.
- Z.
Tang,
J. Maclennan, and P. Kim. Building data mining solutions with OLE DB for
DM and XML analysis. SIGMOD Record, 34(2):80–85, 2005.
- Hetal
Thakkar
Mozafari and Carlo Zaniolo:Designing an Inductive Data Stream Management
System: the Stream Mill Experience. The Second International Workshop on
Scalable Stream Processing Systems, March 29, 2008, Nantes, France.
- Hetal
Thakkar,
Nikolay Laptev, Hamid Mousavi, Barzan Mozafari and Carlo Zaniolo:SMM: a
Data Stream Management System for Knowledge Discovery, ICDE2011