Fall 2019, CS 239 Schedule 

Date

Papers

Presentation Slides

Due

Week 1. Introduction + Distributed Storage Systems

Sept. 30 (Mon)

1.    Introduction, overview of the class, state-of-the-art in big data systems (Harry)

2.    Challenges in distributed storage systems (Harry)

3.    HDFS – Yahoo (Presenter: Yifan and Haoran, Scribe: Jiyuan and Usama)

Class overview

Challenges in distributed storage systems


HDFS


Oct. 2 (Wed)

1.    GFS - Google (Presenter: Shiqi, Scribe: Yuanqi, Xuanqing)

2.    Bigtable – Google (Presenter: Jiyuan and Usama, Scribe: Haoran and Yifan )

3.    Spanner - Google (Presenter: Willie Wu and Kevin Hsieh, Scribe: Junyu and Kedar)



Paper presentation selection due on Friday

Week 2. Distributed Storage Systems + Engines 

Oct. 7 (Mon)

1.    Azure storage – Microsoft (Yuanqi Li, Scribe: Amir, Shivam)

2.    Introduction to data-parallel engines (Harry)

3.    MapReduce – Google (Gaohao Liu, Scribe: Zhaoning, Chun)

Challenges in data parallel engines


Group formation due

Oct. 9 (Wed)

1.     Dryad – Microsoft  (Amir, Scribe: Yu-Chen, Tianyi)

2.     Spark + RDD – Berkeley + DataBricks (Shivam, Pratik Scribe: David, Rach)

3.     Distributed aggregation –Microsoft (Zhaoning, Scribe: Howard, Shen)



Week 3. Engines + Batch Processing

Oct. 14 (Mon)

1.    Map-reduce online (Tianyi Ma, scribes: Liran Xiao, Chengyao Zhang)

2.    Map-reduce-merge (Kedar Deshpande, Victor Fu, Scribes: Enbang Zhang, Swati Sharma)



 Harry is out of town.

Oct. 16 (Wed)

1.    Introduction to batch-processing systems (Harry)

2.    Hive – Facebook (Yujun Zhao, Scribes: Wenlong Xiong, Arjun Srinivasan)

3.    Spark SQL -- Databricks (Junyu Guo, Scribes: Neil Agarwal)

Challenges in batch-processing systems

 

Week 4. Batch Processing 

Oct. 21 (Mon)

1.    SCOPE  -- Microsoft (Chun Chen, Yu-Chen Lin, Scribes: Jintao Jiang, Sahil Gandhi)

2.    FlumeJava -- Google (David Shan, Arjun Srinivasan, Scribes: Matt Hickey, Qiyue Yao)

3.    DryadLINQ -- Microsoft (Howard Xie, Allen Huang, Scribes: Jay Arora, Wei-ting Chen)

 


Oct. 23 (Wed)

      1. Introduction to   

Scheduling and resource    

Management (Harry)

      2. Mesos (Liran Xiao, Zhuyan Chen, Scribes: Srishti Majumadar, Nandan Parikh)

      3. YARN (Chengyao Zhang, Enbang Zhang, Scribes: Siqi Liu and Jules Ahmar)



Week 5.  Scheduling + Resource Management

Oct. 28 (Mon)

      1.    Sparrow (Swati Sharma, Scribes: Mathanky)

2.    Borg -- Google (Wenlong Xiong, Calvin Pham, Scribes: Tanya Chinchore, Yujun Zhao)

3.    Tachyon -- Databricks (Arghya Mukherjee, Scribes: Zhufeng Pan, Pratik Nichat)

 

 Harry is out of town.

Oct. 30 (Wed)

 1. Introduction to stream processing (Harry)

      2.  Storm (Neil Agarwal, Scribes: Jonathan Chee, Rustem Can Aygun)

      3. Flink (Rach Liu, Thomas Pan, Scribes: Keerthana Sankar and Enbang Zhang) 

 

 

Week 6. Stream Processing

Nov. 4 (Mon)

     1.    Kafka (Jintao Jiang, Millan Batra, Scribes: Kaushik Mahorke, Horan Ma)

2.    Naiad -- Microsoft (Matt Hickey, Scribes: Chia-Hung Ni, Sijie Xiong)

3.    Trill   -- Microsoft (Qiyue Yao, Scribes: Yifan Qiao, Usama Hameed)

 

 

Nov. 6 (Wed)

 

Project proposal presentations I

 

 

Week 7. Stream Processing 

Nov. 11 (Mon)

Veterans Day - No class

 

 

Nov. 13 (Wed)

Project proposal presentations II



Week 8. Graph Processing

Nov. 18 (Mon)

1.    SVE --Facebook (Jay Arora, Chia-Hung Ni, Scribes: Jinyuan Wang, Yuanqi Li)

2.    Drizzle (Keerthana Sankar, Scribes: Yuanqi Liu, Rupa Mahadevan)

3.    Structured Streaming -- Data bricks (Wei-ting Chen, Scribes: Kevin Hsieh,  Jay Arora)

 

 

Nov. 20 (Wed)

1.    Introduction to graph processing (Harry)

2.    Pregel -- Google (Christian Warloe, Sahil Gandhi, Scribes: Willie Wu, Jules Ahmar)

3.    Ligra (Shen Teng, Scribes: Mathanky Sankaranarayanan, Jonathan Chee)

 

 

Week 9. ML Systems

Nov. 25 (Mon)

1.    GraphChi  (Srishti Majumdar, Nandan Parikh, Scribes: Kevin Hsieh, Shivam)

2.    XStream  (Pooja Nagaraja, Rupa Mahadevan, Scribes: Willie Wu, Millan Batra)

3.    GridGraph (Jules Ahmar, Scribes: Ryan Tsang, Amir Yazdi-Nejad)

4.  Parameter server (Kaushik Mahorker, Scribes: Pooja Janagal Nagaraja, Austin Guo)


 

Nov. 27 (Wed)


Class cancelled



 

Week 10. Memory Management + Project Presentation Week

Dec. 2 (Mon)

1.    Project Adam -- Microsoft (Mathanky Sankaranarayanan, Tanmay Chinchore, Scribes: Millan Batra, rustem can aygun)

2.    TensorFlow -- Google (Austin Guo, Ryan Tsang, Scribes: Sijie Xiong, Rupa Mahadevan)

3.    Framework for emerging AI (Zhufeng Pan, Scribes: Pooja Janagal Nagaraja, Keerthana Sankar)

4.     TVM: ompiler for Deep Learning (Xuanqing Liu, Scribes: Qiyue Yao)

 

 

Dec. 4 (Wed)

     1.    Introduction to big data memory management (Harry) 

2.    Bloat-aware design (Sijie Xiong, Scribes: Ryan Tsang, Matt Hickey)

3.    Broom  (Jonathan Chee, Scribes: Austin Guo, Nandan Parikh)

4.  Yak (Rustem can Aygun, Scribes: Neil Agarwal,  Arjun Srinivasan)

 

Final report due 

on Friday