CS 239: Automated Testing of Data and Compute Intensive Systems
(Winter 2022)
Instructor: Professor. Miryung Kim (ENG 6, Room 474)
Lectures: Mondays and Wednesdays 10AM to 11:50 AM
Office Hours: By appointment only
Zoom: http://ucla.zoom.us/my/miryung
General Description
Automated test input generation has emerged as
an effective technique for testing software systems. For example,
fuzz testing has been remarkably successful in uncovering critical
security bugs in applications such as Chrome web-browser and
SQLLite database. Dynamic symbolic execution has leveraged the
advancement of constraint solving technologies to find many
security-related bugs. However, the effectiveness of current test
input generation tools is based on inherent yet over-sighted
assumptions: (1) it takes a minuscule amount of time in the order
of milliseconds to execute the target application, (2) a set of
arbitrary input mutations is likely to yield meaningful inputs, or
(3) either the target application code is short, or premade
abstraction of underlying libraries and frameworks is available.
In this course, we will discuss research papers on the foundation
of automated fuzz testing and students will design a new fuzzing
approach to extend its benefit to various kinds of data-intensive
and compute-intensive domains.
In this course, students will discuss research papers on the
foundation of automated fuzz testing and design a new fuzzing
approach to extend its benefit to the data-intensive and
compute-intensive domains. For example, we will investigate new
input mutation techniques and testing latency reduction methods
for data-intensive applications. We will investigate effective
guidance metrics and feedback monitoring methods for
compute-intensive applications. We will investigate new automated
testing strategies to diagnose data skews, compute skews, and
memory skews, as performance is a key concern in these emerging
data- and compute- intensive domains. In a course project,
students will choose a specific data-intensive or
compute-intensive domain that they are familiar with and will
design a new input generation technique for this domain. Example
domains include dataflow-based distributed computation frameworks
such as Apache Spark or Hadoop, heterogeneous computing
applications that can execute on both CPU, GPU and FPGA
accelerators, computer architecture simulation,
memory-disaggregation cloud architectures, ML-enable mobile
applications, etc.
PTE based enrollment
If you are interested in taking this class, please just come to the
class. There are enough slots available and you do not need to
be enrolled for this class in advance. In the first two weeks,
the instructor will hand out PTEs to the students interested in
taking this class. Classes will meet via Zoom in the first two weeks
at http://ucla.zoom.us/my/miryung
Course Schedule
The class schedule and reading material is listed below. Our
textbook is "The Fuzzing Book." This
book includes online chapters and interactive jupyter
notebooks.
This is a seminar class geared towards software engineering research
oriented students. If you are not comfortable with reading academic
research papers, this class will be challenging for you to keep up.
A significant portion of your grade will be based on your ability to
quickly install, run, evaluate, and extend an existing research
prototype. If you are unsure of your qualifications, please
contact the instructor, who will determine whether this course is
right for you.
Technology Tutorial: Dynamic Taint Tracking Phosphor,
Sign Up T15.
Week 10
Final Project Presentation
Students work in the team of 2 students.
Students should design a new automated testing approach or
extend an existing tool. Students do not need to start from
scratch, and instead build their approach on any existing
tools.
There will be several milestones.
20%: Paper Presentation
For paper presentations, each person will discuss the assigned
paper to discuss recent advances related to the lecture's topic.
Each presentation should be about 20 minutes long.
The presentation could be done individually or a pair. After
the presentation, you should lead an in-class discussion with
your fellow classmates.
20%: Tool Tutorial
For technology tool tutorials, each presenter will create a
set of toy examples, on-line tutorials, and live in-class
demonstration to teach tools and environments discussed in the
class.
A tool demonstration should consists of 5 minute overview
followed by 15 minute live demonstration.
The presentation is done individually.
30%: Take Home Exam on Software Testing (Final's
Week)
Sign Up
Each student can sign up for a paper presentation and a tool
tutorial using this
URL.
Piazza
You can sign up for Piazza for this class using this URL.
Reading Questions
Please consider the following points as you read the papers. During
in-class Q&A, we may use Piazza to enter your question about the
reading assignment.
Cool or significant ideas. What is new here? What are the main
contributions of the paper? What did you find most interesting?
Is this whole paper just a one-off clever trick or are there
fundamental ideas here which could be reused in other contexts?
Fallacies and blind spots. Did the authors make any
assumptions or disregard any issues that make their approach
less appealing? Are there any theoretical problems, practical
difficulties, implementation complexities, overlooked influences
of evolving technology, and so on? Do you expect the technique
to be more or less useful in the future? What kind of code or
situation would defeat this approach, and are those programs or
scenarios important in practice? Note: we are not interested in
flaws in presentation, such as trivial examples, confusing
notation, or spelling errors. However, if you have a great idea
on how some concept could be presented or formalized better,
mention it.
New ideas and connections to other work. How could the paper
be extended? How could some of the flaws of the paper be
corrected or avoided? Also, how does this paper relate to others
we have read, or even any other research you are familiar with?
Are there similarities between this approach and other work, or
differences that highlight important facets of both?
Class Discussion: Think-Pair-Share
How Does It Work?
1) Think. The teacher provokes students' thinking with a question or
prompt or observation. The students should take a few moments
(probably not minutes) just to THINK about the question.
2) Pair. Using designated partners (such as with Clock Buddies),
nearby neighbors, or a deskmate, students PAIR up to talk about the
answer each came up with. They compare their mental or written notes
and identify the answers they think are best, most convincing, or
most unique.
3) Share. After students talk in pairs for a few moments (again,
usually not minutes), the teacher calls for pairs to SHARE their
thinking with the rest of the class. She can do this by going around
in round-robin fashion, calling on each pair; or she can take
answers as they are called out (or as hands are raised). Often, the
teacher or a designated helper will record these responses on the
board or on the overheads
Grading Scheme
Project, paper presentation, and tool tutorial grading scheme will
use the following scales.
5 pt: Excellent design, complete implementation,
selection of a task that is highly intellectually challenging,
creative, concise yet comprehensive writing, nearly perfect answers,
beautifully written and verbally communicated, eloquent presentation
within a time limit
4 pt: Very good, mostly correct answers, i.e.,
>85%, selection of a intellectually challenging project topic,
well written, good verbal presentation, well practiced presentation
within a time limit
3 pt: Good understanding of the key concepts,
mostly correct answers, i.e., >70%, selection of a intellectually
challenging project topic, well written, good verbal presentation,
well practiced presentation within a time limit
2 pt or 1pt: Poor, shallow, minimally sufficient,
or needlessly wordy, key concepts misunderstood or missing,
selection of easy project tasks, poor written and verbal
communication, presentation over time, not following specified
formats
Class Policy
If you are interested in taking the class,
please come to the class continuously and participate in all
required activities. The enrollment is based on PTE only and I
will provide PTE after class begins.
Since many students sign up for classes
with the intent to drop later, in the first month, I will be
collecting information about your intent to stay in the class
through attendance/ presentation sign-up/ in-class Q&A
information.
Sometimes, a class schedule may be shifted
inevitably. Your sign-up is based on the paper not by the date,
so please be prepared to discuss the paper +/- a few days from
your original presentation date.
COVID-19 Response and Recovery Task Force FAQ
Students must adhere to the current campus directives
related to COVID-19 mitigation, and refusal to do so may
result in the student being asked to leave the classroom
or referred to Conduct.