CS240B
Winter 2017





Syllabus

Notes


Home



Talks

----------------------------------------------------------------------------------------------------------------

TuTh: 14:00--15:50

Fall 2018

CS240B:ADVANCED DATA BASES and KNOWLEDGE BASES

In Data Stream Management Systems:
Supporting Data Stream Mining Applications

Instructor: Carlo Zaniolo

Office Hours: Tuesday,  16:00--18:00

Midterm on TBA


First Assignement (Due on  ?)

  • Task 1.1: Write a four page (about) report on SQL-MR focusing on the things that you can do with SQL-MR that cannot be done in  SQL-TS,
  • Task 1.2: Some of the new OLAP functions support windows others do not. List those in the two groups, and for those in the second group outline the various  problems   that will have to be solved  to support (i) unlimited preceding windows, and (ii) physical windows. Whenever simple implementation exist express them by the SQL-defined UDAs described in the notes.

-------------------------------------------------------------------------------------------------

  Second Assignement (Due on ?)

  • Task 2.1: Using a syntax based on that of notes and the two reference above, express a user-defined aggregate d_count to perform the exact count of distinct values in a window on a data stream. Your window aggregate could, e.g., be called as follows:
    SELECT col_name1, d_count(col_name2) OVER (ROWS 99999 PRECEDING) FROM my_stream
  • Task 2.2:  Using the same syntax, write a UDA that implements the RANK function (without window or with a unlimited-preceding window).  Then  extend this definition to support windows.

-----------------------------------------------------------------------------------------------------

      Third Assignement (Due on ?)

  •  Study the following papers: 
  • Task 3.1: We showed that binary operators, such as union and joins, are subject to idle-wating problem. But this problem can also occur in some unary operators. Say for instance that an aggregate is called on a time-stamped tumble window (or on a timestamped window with slides). Discuss this situation, its idle-waiting problem, and how it can be solved (using concepts and notation from the paper by Bai and Zaniolo above ).

  • Task 3.2: Propose a  simple generalization of the chain algorithm to minimize memory for the following situations:
    (A) Two simple paths without forks,
    (B) a simple tree where a path forks out into a pair of paths. (You should use concepts and notation from the papers above.)

  • Task 3.3: Write the window UDA to compute the standard deviation on a data stream. Explain the math formulas you use in the expire of the window UDA---you might want to consult textbooks or the  web  to find  the best formula.
  • Fourth Assignement (due on?)

  •  Study the following papers:
  • Task 4.1: Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same way as d_count. You can assume that you have available a function LmostbitH(X) that return K, where the K position contains a 1, whereas all the position to its right are zeros, for the value returned by a randomizing hash function H(X).
  • Task 4.2:  Assume that  you have a stream of temperature readings temperature(Celsius Integer) that start everyday at time 00:01  and end at  time 23:59.  At the end of each day, we want to have 10,000 temperature samples stored into a table   tenKsamples(Rowno integer, Celsius Integer). We do not know how many temperature readings are going to arrive every day, except that their number is significantly larger than 10,000. Please write a UDA that uses the reservoir algorithm to populate tenKsamples(Rowno , Celsius) with 10,000 random samples taken from temperature(Celsius Integer), which is then processed and reset to empty at midnight You can assume that the system support a function random(K), which given a  positive integer returns a random integer between 1 and K.

        Answers for 4.1 and 4.2.

-----------------------------------------------------------------------------------------------------

       Project Assigned

-----------------------------------------------------------------------------------------------------

       Practice Midterm

-----------------------------------------------------------------------------------------------------

       Presentation and Final Ideas

-----------------------------------------------------------------------------------------------------