| 
           
             
             
             
            CS240B  
                Fall 2018 
               
               
               
               
               
               
               
               
               
               
               
               
               
               
               
               
               
              Syllabus 
               
               
               
               
               
               
               
               
              Notes 
               
               
               
               
               
               
               
               
              Home 
             
             
             
             
               
               
               
               
             
            Talks
           | 
          
          
            
              ----------------------------------------------------------------------------------------------------------------
              Tuesday and Thursday 14:00--15:50
              Fall 2018  
              CS240B:ADVANCED DATA BASES and KNOWLEDGE BASES 
              Data
                        Stream Management Systems: 
                        Supporting Data Stream Mining Applications 
             
            Instructor:
                      Carlo
                        Zaniolo  
            Office Hours: Tuesdays and Thursdays, 
                    16:00--17:00. 
             
            
               First
                  Assignement (Due on Tuesday  Oct. 9)  Some answers for 
                2.1, 2.2 
              
              
                - Study the following papers:
 
                   
                  
                 
                - Complete the following tasks and turn them in via
                    ccle:
 
               
              
              
                - Task 1.1: Write a four page
                  (about) report on SQL-MR focusing on the things that you can
                  do with SQL-MR that cannot be done in  SQL-TS,
 
                - Task 1.2: Some of the new
                  OLAP functions support windows others  do not. List those in
                  the two groups, and for those in the second group, suggest a
                  general implementation for  (i) unlimited preceding windows,
                  and (ii) physical windows. 
 
               
              ------------------------------------------------------------------------------------------------- 
                Second
                  Assignement (Due on  Tuesday Oct. 16)   Answers
                  for  2.1, 
                  and 2.2.
              
                - Study the following papers:
 
                   
                  
                    - Y.
                        Bai, H. Thakkar, C. Luo, H. Wang and CZaniolo: A
                        Data Stream Language and System Designed for Power and
                        Extensibility.  (CIKM'06),
                        Arlington, Virginia, USA, 
 
                       
                    - H.
                        Thakkar, N.  Laptev, H. Mousavi, B. Mozafari, V. Russo
                        and C. Zaniolo: SMM:
                        A data stream management system for Knowledge Discovery. ICDE
                        2011.
 
                         
                         
                       
                   
                 
                - Complete the following tasks and turn them in via
                    ccle:  
 
               
              
                - Task 2.1: Using
                  a syntax based on that of notes and the two reference above,
                  express a user-defined aggregate d_count to perform the exact
                  count of distinct values in a window on a data stream. Your
                  window aggregate could, e.g., be called as follows:
 
                  SELECT col_name1, d_count(col_name2) OVER (ROWS 99999
                  PRECEDING) FROM my_stream 
                - Task 2.2:  Using the same
                  syntax, write a UDA that implements the RANK function (without
                  window or with a unlimited-preceding window).  Then  extend
                  this definition to support windows.
                    
 
                   
               
                     
------------------------------------------------------------------------------------------------ 
                    Third Assignement (Due
                  on Tuesday, Oct. 23)  Answers
                  for  3.1,  3.2, 
                  and 3.3.
              
              
                -  Study the following
                    papers:  
                
 
               
              
                
                  - Yijian Bai and Carlo Zaniolo: Minimizing
                      Latency and Memory in DSMS: A Unified Approach to
                      Quasi-Optimal Scheduling . The Second International
                    Workshop on Scalable Stream Processing Systems, March 29,
                    2008, Nantes, France.
 
                   
                  - Y. Bai, H. Thakkar, H. Wang, and C. Zaniolo. Time-stamp
                      Management and QueryExecution in Data Stream Management
                      Systems. IEEE Internet Computing, 12(6): 2008.
 
                  -  Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining Data
                    Streams - The Stanford University InfoLab.  You
                      only need to study sections 4.3 and 4.4. But the
                    rest is also interesting. 
                        http://infolab.stanford.edu/~ullman/mmds/ch4.pdf
 
                           
                         
                 
                -  Complete the  following
                    Tasks: 
 
                   
               
              
                - Task 3.1: We
                    showed that binary operators, such as union and joins, are
                    subject to idle-wating problem. But this problem can also
                    occur in some unary operators. Say for instance that an
                    aggregate is called on a time-stamped tumble window (or on a
                    timestamped window with slides). Discuss this situation, its
                    idle-waiting problem, and how it can be solved (using
                    concepts and notation from the paper by Bai and Zaniolo
                    above ).
 
                   
                 
                - Task 3.2:
                  Propose a  simple generalization of the chain algorithm to
                  minimize memory for the following situations:
 
                  (A) Two or more simple paths without forks,  
                   
                 
                - Task 3.3:
                  Write the window UDA to compute the standard deviation on a
                  data stream. Explain the math formulas you use in the expire
                  of the window UDA---you might want to consult textbooks or
                  the  web  to find  the best formula.
 
               
                        
              -------------------------------------------------------------- -----------------------------------
              
               
                 
                    Fourth Assignement (Due on Tuesday Oct. 30)  Answers
                    for 
                      4.1, 
                      and
                        4.2.
               
              
                
                  -  Study the following
                      papers:
 
                 
                
                - Complete the  following
                    Tasks:
 
                - Task 4.1: Express
                  the Flajolet-Martin's distinct_count sketch as a user-defined
                  aggregate mamed dcount_sketch, to be called in the same way as
                  d_count. You can assume that you have available a function
                  LmostbitH(X) that return K, where the K position contains a 1,
                  whereas all the position to its right are zeros, for the value
                  returned by a randomizing hash function H(X).
 
                - Task 4.2:  Assume that  you
                  have a stream of temperature readings temperature(Celsius
                    Integer) that
                  start everyday at time 00:01  and end at  time 23:59.  At the
                  end of each day, we want to have 10,000 temperature samples
                  stored into a table   tenKsamples(Rowno
                      integer, Celsius Integer). We
                    do not know how many temperature readings are going to
                    arrive every day, except that their number is significantly
                    larger than 10,000. Please write a UDA that uses the reservoiralgorithm
                    to populate tenKsamples(Rowno
                      , Celsius) with
                    10,000 random samples taken fromtemperature(Celsius
                      Integer), which is then processed and reset to
                    empty at midnight.  You
                    can assume that the system support a function random(K), which
                    given a  positive integer K  returns
                    a random integer between 1 and K.
 
               
              ----------------------------------------------------------------------------------------------------- 
                           Project
                    Assigned on October 30--Due on November 20 (strict
                  deadline). 
                
              ----------------------------------------------------------------------------------------------------- 
              
              ----------------------------------------------------------------------------------------------------- 
                         Presentations
                  (individual or by pairs).  November 15--December 6.  
                             Possible topics.
                     
                  
                         For additional references see  here
                and here.
                                Current
                    schedule. 
              ----------------------------------------------------------------------------------------------------- 
                         Take home final
                  project. Research Report  due on or before December 17. 
                             It should be done individually (rather than in
                  pairs.) 
                
              ----------------------------------------------------------------------------------------------------- 
              Holidays 
                      
              
              
               
              
                
             
           |