Fall 2020, CS 239 Schedule 

Date

Papers

Presentation Slides

Due

Week 1. Introduction + Distributed Storage Systems

Oct. 5 (Mon)

1.    Introduction, overview of the class, state-of-the-art in big data systems (Harry)

2.    Challenges in distributed storage systems (Harry)

3.    HDFS – Yahoo (Presenter: Scribe:)

Class overview

Challenges in distributed storage systems

 

Oct. 7 (Wed)

1.    GFS - Google (Presenters: , Scribes:)

2.    Bigtable – Google (Presenters:, Scribes:)

3.    Spanner - Google (Presenters:, Scribes:)

 

Paper presentation selection due on Friday

Week 2. Distributed Storage Systems + Engines 

Oct. 12 (Mon)

1.    Azure storage – Microsoft (Presenters: , Scribes: )

2.    Introduction to data-parallel engines (Harry)

3.    MapReduce – Google (Presenters: , Scribes:)

Challenges in data parallel engines

Group formation due

Oct. 14 (Wed)

1.     Dryad – Microsoft  (Presenters:, Scribes:)

2.     Spark + RDD – Berkeley + DataBricks (Presenters:  Scribes: )

3.     Distributed aggregation –Microsoft (Presenters:, Scribes:)

Week 3. Engines + Batch Processing

Oct. 19 (Mon)

1.    Map-reduce online (Presenters: Scribes:)

2.    Map-reduce-merge (Presenters:, Scribes:)

 Harry is out of town.

Oct. 21 (Wed)

1.    Introduction to batch-processing systems (Harry)

2.    Hive – Facebook (Presenters:, Scribes:)

3.    Spark SQL -- Databricks (Presenters:, Scribes: )

Challenges in batch-processing systems

 

Week 4. Batch Processing 

Oct. 26 (Mon)

1.    SCOPE  -- Microsoft (Presenters: , Scribes:)

2.    FlumeJava -- Google (Presenters: , Scribes:)

3.    DryadLINQ -- Microsoft (Presenters: , Scribes:)

 

Oct. 28 (Wed)

      1. Introduction to   

Scheduling and resource    

Management (Harry)

      2. Mesos (Presenters: , Scribes:)

      3. YARN (Presenters:, Scribes:)

Challenges in scheduling and resource management

Week 5.  Scheduling + Resource Management

Nov. 2 (Mon)

      1.    Sparrow (Presenters:, Scribes:)

2.    Borg -- Google (Presenters:, Scribes:)

3.    Nexus -- UW (Presenters:, Scribes:)

 

 Harry is out of town.

Nov. 4 (Wed)

      1. Introduction to stream processing  (Harry)

      2.  Storm (Presenters:, Scribes:)

      3. Flink (Presenters:  Scribes:) 

 

 

Week 6. Stream Processing

Nov. 9 (Mon)

     1.    Kafka (Presenters: , Scribes:)

2.    Naiad -- Microsoft (Presenters: , Scribes:)

3.    Trill   -- Microsoft (Presenters:, Scribes:)

 

 

Nov. 11 (Wed)

 

 Veterans Day - No class

 

 

Week 7. Project Presentations

Nov. 16 (Mon)

     Project proposal presentations I

 

 

Nov. 18 (Wed)

Project proposal presentations II

Week 8. Stream Processing 

Nov. 23 (Mon)

1.    SVE --Facebook (Presenters:, Scribes:)

2.    Drizzle (Presenters:, Scribes:)

3.    Structured Streaming -- Data bricks (Presenters:, Scribes:)

 

 

Nov. 25 (Wed)

Class canceled due to Thanksgiving

 

 

Week 9. Graph Systems

Nov. 30 (Mon)

1.    Introduction to graph processing and ML systems (Harry)

2.    Pregel -- Google (Presenters:, Scribes:)

3.    Ligra (Presenters: , Scribes: )

Challenges in graph processing and ML systems

 

Dec. 2 (Wed)

1.    GraphChi  (Presenters: , Scribes: )

2.    XStream  (Presenters:, Scribes:)

3.    RSrtream (Presenters:, Scribes:)

 

Week 10. Memory Management + Project Presentation Week

Dec. 7 (Mon)

1.    Parameter server (Presenters:, Scribes:)

2.    TensorFlow -- Google (Presenters:, Scribes:)

3.    Ray (Presenters:, Scribes:)

4.   TVM (Presenters:, Scribes:)

 

 

Dec. 9 (Wed)

     1.    Introduction to big data memory management (Harry) 

2.    Broom  (Presenters:, Scribes:)

3.  Yak (Presenters:, Scribes:)

4.  Niijima (Presenters: , Scribes:)

Challenges in memory management 

Final report due 

on Friday

Dec. 10-11

Final Presentation