CS240B
  Fall 2018

















Syllabus








Notes








Home










Talks

----------------------------------------------------------------------------------------------------------------

Tuesday and Thursday 14:00--15:50

Fall 2018

CS240B:ADVANCED DATA BASES and KNOWLEDGE BASES

Data Stream Management Systems:
Supporting Data Stream Mining Applications

Instructor: Carlo Zaniolo

Office Hours: Tuesdays and Thursdays,  16:00--17:00.


First Assignement (Due on Tuesday  Oct. 9)  Some answers for  2.1, 2.2

  • Task 1.1: Write a four page (about) report on SQL-MR focusing on the things that you can do with SQL-MR that cannot be done in  SQL-TS,
  • Task 1.2: Some of the new OLAP functions support windows others  do not. List those in the two groups, and for those in the second group, suggest a general implementation for  (i) unlimited preceding windows, and (ii) physical windows.

-------------------------------------------------------------------------------------------------

  Second Assignement (Due on  Tuesday Oct. 16)   Answers for  2.1,  and 2.2.

  • Task 2.1: Using a syntax based on that of notes and the two reference above, express a user-defined aggregate d_count to perform the exact count of distinct values in a window on a data stream. Your window aggregate could, e.g., be called as follows:
    SELECT col_name1, d_count(col_name2) OVER (ROWS 99999 PRECEDING) FROM my_stream
  • Task 2.2:  Using the same syntax, write a UDA that implements the RANK function (without window or with a unlimited-preceding window).  Then  extend this definition to support windows.

        ------------------------------------------------------------------------------------------------

      Third Assignement (Due on Tuesday, Oct. 23)  Answers for  3.1 3.2,  and 3.3.

  •  Study the following papers: 
  • Task 3.1: We showed that binary operators, such as union and joins, are subject to idle-wating problem. But this problem can also occur in some unary operators. Say for instance that an aggregate is called on a time-stamped tumble window (or on a timestamped window with slides). Discuss this situation, its idle-waiting problem, and how it can be solved (using concepts and notation from the paper by Bai and Zaniolo above ).

  • Task 3.2: Propose a  simple generalization of the chain algorithm to minimize memory for the following situations:
    (A) Two or more simple paths without forks,

  • Task 3.3: Write the window UDA to compute the standard deviation on a data stream. Explain the math formulas you use in the expire of the window UDA---you might want to consult textbooks or the  web  to find  the best formula.
           -------------------------------------------------------------------------------------------------

      Fourth Assignement (Due on Tuesday Oct. 30)  Answers for  4.1,  and 4.2.

    •  Study the following papers:
  • Complete the  following Tasks:
  • Task 4.1: Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same way as d_count. You can assume that you have available a function LmostbitH(X) that return K, where the K position contains a 1, whereas all the position to its right are zeros, for the value returned by a randomizing hash function H(X).
  • Task 4.2:  Assume that  you have a stream of temperature readings temperature(Celsius Integer) that start everyday at time 00:01  and end at  time 23:59.  At the end of each day, we want to have 10,000 temperature samples stored into a table   tenKsamples(Rowno integer, Celsius Integer). We do not know how many temperature readings are going to arrive every day, except that their number is significantly larger than 10,000. Please write a UDA that uses the reservoiralgorithm to populate tenKsamples(Rowno , Celsius) with 10,000 random samples taken fromtemperature(Celsius Integer), which is then processed and reset to empty at midnight You can assume that the system support a function random(K), which given a  positive integer  returns a random integer between 1 and K.

-----------------------------------------------------------------------------------------------------

             Project Assigned on October 30--Due on November 20 (strict deadline).

-----------------------------------------------------------------------------------------------------

           Midterm : Tuesday November 6  (Practice Midterm  
           Correction of actual Midterm.

-----------------------------------------------------------------------------------------------------

           Presentations (individual or by pairs).  November 15--December 6.
           Possible topics.

           For additional references see  here and here.

                  Current schedule.

-----------------------------------------------------------------------------------------------------

           Take home final project. Research Report  due on or before December 17.
           It should be done individually (rather than in pairs.)

-----------------------------------------------------------------------------------------------------

Holidays