|
Read
the
following
reports
A Density-Based
Algorithm
for Discovering
Clusters
in Large
Spatial
Databases
with Noise.
Martin Ester,
Hans-Peter
Kriegel,
Jörg
Sander,
Xiaowei
Xu. By KDD
1996: 226-231
OPTICS:
Ordering
Points To
Identify
the Clustering
Structure.
by Mihael
Ankerst,
Markus M.
Breunig,
Hans-Peter
Kriegel,
Jörg
Sander:
SIGMOD Conference
1999: 49-60
Answer
the following
Questions
(Due on
May 30)
Task 5.1:
Given the
following
measurement
for the
variable
age:
18, 22,
25, 42,
28, 43,
33, 35,
56, 28,
standardize
the variable
by the following:
(a) compute
the mean
absolute
deviation
of age,
(b) compute
the z-score
for the
first four
measurement.
Task 5.2.
Present
situations
for which
density-based
clustering
is more
suitable
than partitioning-based
clustering.
Task 5.3.
Show how
the ESL
code for
DBSCAN can
be modified
to implement
OPTICS.
Task 5.4.
Association
rule mining
using frequent
itemsets
often generate
a large
number of
rules. Explain
how the
notion of
correlation
can be used
to eliminate
many rules
while preservintg
those that
are semantically
interesting.
Task 5.5.
Explain
the benefits
offered
by boosting
ensembles
when mining
data bases
and data
streams.
Study the following papers:
Do the following project
(cs240A final revised for WEKA--Due May 16):
Assignment No 3 (Due on Mo,
April 30 WD, May 2)
Read The following papers:
A
heartbeat mechanism and its application in gigascope.
by Theodore Johnson et al.: In VLDB 2005, pages
1079-1088.
Optimizing
Timestamp Management in Data Stream Management Systems.
by Yijian Bai et al.: In ICDE 2007.
Task 3.1. EXAMPLE 13--MAX with Windows--of
the CIKM 2006 listed below has a bug insofar as
it does not consider the case where
inwindow is empty:
-- discuss the cases where inwindow
is empty,
--fix the bug a show a correct definition for this
UDA.
Task 3.2. We showed how to fix the
idle-wating problem for union and joins. However,
aggregates on logical windows with slides can also
experience a similar problem [answer]
-
explain how the problem occurs
(you might want to start from the case of tumbles)
-
give the execution of aggregates
on a logical window with slides (in the style
of the foil with header tau be the content of
the Time Stamp Memory (TSM) Register).
Task 3.3. Aleri and Progress (apama) represent two recently annnounced
DSMS products. Find out what you can about these two. Compare them with
Coral8 and Stream Base in a short report (one page).
Week 2 Assignment (Due on Monday, April 23).
Task 2.1 Write a short (one page or two powerpoint
slides ) report on Coral8 (vs. Streambase)
Task 2.2 Write a short (one page or two powerpoint
slides ) report on Streambase (vs. Coral8)
Taks 2.3 Show some blocking and non-blocking queries
in SQL-MR
and, for each, explain why it is or it is not blocking.
Task 2.4. Assume that you have a table describing
Ebay bids for the day.
Bids ( itemID /* the item being
bid for*/,
price, /* bid price*/
bidderID, /* id of the bidder*/
timestamp /* time when bid was registered*/)
Write two ESL UDAs to compute the SUM and the SUM-DISTICNT
over a window, and use them to return, for each
itemID, the sum and sum-distinct of the prices over
the last 100 bids.
Week 1 Assignment (Due on Monday, April 9).
* Study K. Guion's overview
on OLAP Functions.
* Study the proposed Match-Recognize ANSI Standards
for SQL (SQL-MR
for short)
Turn in your answer to the following assignment:
Task 1.1. Assume that you have the following temporal
table:
emp(Eno, Project, Tstart, Tend)
Denoting periods of time during which an employee
has worked on a project. The closed intervals
denoting these periods could overlap, and thus
you need to coalesced them into maximal periods.
Suggestion, sort all events in a sequence, and
then use SQL-MR to do the actual coalescing, and
reconstruct the original table with the intervals
coalesced.
Task 1.2. Sensors have detected locations of objects
at certain time. So
items( itemNo, SensorNo, Time)
Write an SQL-MR query to detect objects that
are going around in a cycle, i.e., they have returned
to the same location withing one day. Many objects
do not move fast, so the sensor might produce
consecutive readings of the same object even if
this is not in a cycle.
Task 1.3. Express in SQL-TS the example query of
Section 2.1 of the SQL-MR document.
Task 1.4. Write a short (one page or less) report
on SQL-MR explaining the parts that, in your view,
have problems or require clarification (e.g., my
view is that outputs are not clear in the presence
of alternative patterns).
|