CS240B Syllabus

First Assignement (Due on Tuesday Oct. 9) Some answers for 2.1, 2.2

Study the following papers:
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom: Models and Issues in Data Stream Systems. PODS 2002.
- SQL-MR.
- New SQL OLAP Functions for everyone, by K.M. Guion.
Complete the following tasks and turn them in via ccle:

Task 1.1: Write a four page (about) report on SQL-MR focusing on the things that you can do with SQL-MR that cannot be done in SQL-TS,
Task 1.2: Some of the new OLAP functions support windows others do not. List those in the two groups, and for those in the second group, suggest a general implementation for (i) unlimited preceding windows, and (ii) physical windows.

-------------------------------------------------------------------------------------------------

Second Assignement (Due on Tuesday Oct. 16) Answers for 2.1, and 2.2.

Study the following papers:
- Y. Bai, H. Thakkar, C. Luo, H. Wang and CZaniolo: A Data Stream Language and System Designed for Power and Extensibility. (CIKM'06), Arlington, Virginia, USA,
- H. Thakkar, N. Laptev, H. Mousavi, B. Mozafari, V. Russo and C. Zaniolo: SMM: A data stream management system for Knowledge Discovery. ICDE 2011.
Complete the following tasks and turn them in via ccle:

Task 2.1: Using a syntax based on that of notes and the two reference above, express a user-defined aggregate d_count to perform the exact count of distinct values in a window on a data stream. Your window aggregate could, e.g., be called as follows:
SELECT col_name1, d_count(col_name2) OVER (ROWS 99999 PRECEDING) FROM my_stream
Task 2.2: Using the same syntax, write a UDA that implements the RANK function (without window or with a unlimited-preceding window). Then extend this definition to support windows.

------------------------------------------------------------------------------------------------

Third Assignement (Due on Tuesday, Oct. 23) Answers for 3.1, 3.2, and 3.3.

Study the following papers:

Yijian Bai and Carlo Zaniolo: Minimizing Latency and Memory in DSMS: A Unified Approach to Quasi-Optimal Scheduling . The Second International Workshop on Scalable Stream Processing Systems, March 29, 2008, Nantes, France.
Y. Bai, H. Thakkar, H. Wang, and C. Zaniolo. Time-stamp Management and QueryExecution in Data Stream Management Systems. IEEE Internet Computing, 12(6): 2008.
Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining Data Streams - The Stanford University InfoLab. You only need to study sections 4.3 and 4.4. But the rest is also interesting. http://infolab.stanford.edu/~ullman/mmds/ch4.pdf

Complete the following Tasks:

Task 3.1: We showed that binary operators, such as union and joins, are subject to idle-wating problem. But this problem can also occur in some unary operators. Say for instance that an aggregate is called on a time-stamped tumble window (or on a timestamped window with slides). Discuss this situation, its idle-waiting problem, and how it can be solved (using concepts and notation from the paper by Bai and Zaniolo above ).
Task 3.2: Propose a simple generalization of the chain algorithm to minimize memory for the following situations:
(A) Two or more simple paths without forks,
Task 3.3: Write the window UDA to compute the standard deviation on a data stream. Explain the math formulas you use in the expire of the window UDA---you might want to consult textbooks or the web to find the best formula.

-------------------------------------------------------------------------------------------------

Fourth Assignement (Due on Tuesday Oct. 30) Answers for 4.1, and 4.2.

Study the following papers:

Study: Chapter 3: A Survey of Classification Methods in Data Streams) from DATA STREAMS: MODELS AND ALGORITHMS DATA STREAMS. CHARU C. AGGARWAL editor, Kluwer Academic Publishers.
Chapter 3 is at pages 39-60 of the following pdf document.
N. Laptev et al. : Extendin Relational query Languages for Data Streams. pp. 361--386 of
Data StreamManagement--Processing High-Speed Data Streams. Springer, 2017.

Complete the following Tasks:
Task 4.1: Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same way as d_count. You can assume that you have available a function LmostbitH(X) that return K, where the K position contains a 1, whereas all the position to its right are zeros, for the value returned by a randomizing hash function H(X).
Task 4.2: Assume that you have a stream of temperature readings temperature(Celsius Integer) that start everyday at time 00:01 and end at time 23:59. At the end of each day, we want to have 10,000 temperature samples stored into a table tenKsamples(Rowno integer, Celsius Integer). We do not know how many temperature readings are going to arrive every day, except that their number is significantly larger than 10,000. Please write a UDA that uses the reservoiralgorithm to populate tenKsamples(Rowno , Celsius) with 10,000 random samples taken fromtemperature(Celsius Integer), which is then processed and reset to empty at midnight. You can assume that the system support a function random(K), which given a positive integer K returns a random integer between 1 and K.

-----------------------------------------------------------------------------------------------------

Project Assigned on October 30--Due on November 20 (strict deadline).

-----------------------------------------------------------------------------------------------------

Midterm : Tuesday November 6 (Practice Midterm)
Correction of actual Midterm.

-----------------------------------------------------------------------------------------------------

Presentations (individual or by pairs). November 15--December 6.
Possible topics.

For additional references see here and here.

Current schedule.

-----------------------------------------------------------------------------------------------------

Take home final project. Research Report due on or before December 17.
It should be done individually (rather than in pairs.)

-----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------

Tuesday and Thursday 14:00--15:50

Fall 2018

CS240B:ADVANCED DATA BASES and KNOWLEDGE BASES

First Assignement (Due on Tuesday Oct. 9) Some answers for 2.1, 2.2

Second Assignement (Due on Tuesday Oct. 16) Answers for 2.1, and 2.2.

Third Assignement (Due on Tuesday, Oct. 23) Answers for 3.1, 3.2, and 3.3.

Fourth Assignement (Due on Tuesday Oct. 30) Answers for 4.1, and 4.2.

Project Assigned on October 30--Due on November 20 (strict deadline).

Midterm : Tuesday November 6 (Practice Midterm)
Correction of actual Midterm.

Presentations (individual or by pairs). November 15--December 6.
Possible topics.

For additional references see here and here.

Take home final project. Research Report due on or before December 17.
It should be done individually (rather than in pairs.)

Holidays

----------------------------------------------------------------------------------------------------------------

Tuesday and Thursday 14:00--15:50

Fall 2018

CS240B:ADVANCED DATA BASES and KNOWLEDGE BASES

First Assignement (Due on Tuesday Oct. 9) Some answers for 2.1, 2.2

Second Assignement (Due on Tuesday Oct. 16) Answers for 2.1, and 2.2.

Third Assignement (Due on Tuesday, Oct. 23) Answers for 3.1, 3.2, and 3.3.

Fourth Assignement (Due on Tuesday Oct. 30) Answers for 4.1, and 4.2.

Project Assigned on October 30--Due on November 20 (strict deadline).

Midterm : Tuesday November 6 (Practice Midterm) Correction of actual Midterm.

Presentations (individual or by pairs). November 15--December 6. Possible topics.

For additional references see here and here.

Take home final project. Research Report due on or before December 17. It should be done individually (rather than in pairs.)

Holidays

Midterm : Tuesday November 6 (Practice Midterm)
Correction of actual Midterm.

Presentations (individual or by pairs). November 15--December 6.
Possible topics.

Take home final project. Research Report due on or before December 17.
It should be done individually (rather than in pairs.)