Lectures |
Tools |
Readings and Presentations | |
Week 1 1/4 1/6 |
Software Analytics---What is it? Data Scientists---Who are they? |
Software
Analytics in Practice, Zhang et al. IEEE Software 2013 Interactions with Big Data Analytics, Fisher et al. ACM Interactions 2012 The Emerging Role of Data Scientists on Software Development Teams, Kim et al. ICSE' 16 |
|
Week 2 1/11 1/13 |
Change Recommendation Automated Software Repair |
Mining
Version Histories to Guide Software Changes, Zimmermann et
al. ICSE 2004 Automatically Finding Patches using Genetic Programming, Weimer et al. ICSE 2009 |
|
Week 3 1/18 (No Class) 1/20 |
Defect Prediction Quiz 1 (Wed) |
Use
of Relative Code Churn Measures to Predict System Defect
Density, Nagappan and Ball, ICSE 2005 Cross Project Defect Prediction, Zimmermann et al, ESEC FSE 2009 *Please Read Chapter 12 and 13 of "Probability and Statistics for Engineering and the Sciences." Project Part A Description |
|
Week 4 1/25 1/27 |
Software Anomaly Detection and Debugging |
R Studio Profiling and Instrumentation ASM 1. Student Presentation: R Studio Demo: Tutorial on R Studio, Statistics and Linear Regression: Tutorial on Basic Statistics in R 2. Student Presentation: ASM Demo |
Bug
Isolation via remote program sampling, Liblit et al. PLDI
2003 Detecting Object Usage Anomalies, Wasylkowski et al. FSE 2007 |
Week 5 2/1 2/3 |
Software Anomaly Detection and Debugging Quiz 2 (Wed) |
Visualization Metric Extraction 3. Student Presentation: Tableu Demo: Tutorial on Visulization using Tableu 4. Student Presentation: Tutorial on Metrics Extraction Python Parser |
FaultTracer:
a spectrum-based approach to localizing failure-inducing
program edits, Zhang et al. JSEP 2013. Debugging in the (Very) Large: Ten Years of Implementation and Experience, Glerum et al. SOSP 2009 |
Week 6 2/8 2/10 |
Project Midterm Presentations |
Student Team Presentations |
|
Week 7 2/15 (No Class) 2/17 |
Hadoop and MapReduce |
Hadoop-Map Reduce 5. Student Presentation: Spark Demo |
MapReduce:
Simplified Data Processing on Large Clusters, Dean and
Ghematwat, OSDI 2004 Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Zaharia et al. NSDI 2012 |
Week 8 2/22 (No Class) 2/24 |
Big Data Analytics Assistance- Debugging Quiz 3 (Wed) |
Guest Lecture: Demo on BigDebug |
Assisting
Developers of Big Data Analytics Applications When Deploying
on Hadoop Clouds, Shang et al. ICSE 2013 BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark, Gulzar et al. ICSE 2016 |
Week 9 2/29 3/2 |
Big Data Analytics Assistance- Debugging | 6. Student Presentation: Spark ML Lib MLLib
Part. Basic Statitistics, Linear Models, K Nearest
Neighbor, Decision Trees, Naive Bayes 7. Student Presentation: Programming using Interactive Notes |
Inspector
Gadget: A Framework for Custom Monitoring and Debugging of
Distributed Dataflows, Olston and Reed, VLDB 2011 Scalable Lineage Capture for Debugging DISC Analytics, Logothetis et al. SoCC 2013 Titian: Data Provenance Support in Spark, Interlandi et al. |
Week 10 3/7 3/9 |
Project Final Presentation |
Doing Data Science, Straight Talk from the Frontline,
Cathy O'Neil & Rachel Schutt (Find an E-book on "Doing Data
Science" http://proquest.safaribooksonline.com/9781449363871). Linear
Regression: Pages 55-71 (Chapter 3: Algorithms)
Logistics Regression and Logit Function: Chapter 3: Logistics
Regression Pages 113- 134, Decision Tree and Entropy: Chapter 7, Pages
185- 187
"Probability and Statistics for Engineering and the Sciences."--Chapter
12 "Simple and Linear Regression and Correlation" and Chapter 13
"Nonlinear and Multiple Regression" share background on linear
regression, step wise regression, R^2, Adjusted R^2, SSE and SST.
There are multiple copies of the book at UCAL library.
http://catalog.library.ucla.edu/vwebv/holdingsInfo?searchId=6117&recCount=50&recPointer=11&bibId=59896
R Cookbook, The E-version of R Cookbook is available at UCLA library. Read Chapter 9
How Does It Work?
1) Think. The teacher provokes students' thinking with a
question or prompt or observation. The students should take a few
moments (probably not minutes) just to THINK about the question.
2) Pair. Using designated partners (such as with Clock Buddies),
nearby neighbors, or a deskmate, students PAIR up to talk about the
answer each came up with. They compare their mental or written notes
and identify the answers they think are best, most convincing, or most
unique.
3) Share. After students talk in pairs for a few moments (again,
usually not minutes), the teacher calls for pairs to SHARE their
thinking with the rest of the class. She can do this by going around
in round-robin fashion, calling on each pair; or she can take answers
as they are called out (or as hands are raised). Often, the teacher or
a designated helper will record these responses on the board or on the
overhead