Data mining evolved as a set of application relating to discovery of relevant
hidden information in databases, especially discovery of association
rules [AS94,MPC96]. As an application is basket data analysis, where the data
of purchased products (referred as items), is collected and associations
between items are computed. An association rule is of the form where
X and Y are two sets of items. The frequency of such an observation in data
gives the strength of the rule. As an example for a probable rule:
the relevance could be measured by:
The closer C is to 1 the stronger the confidence in the rule. This could
be computed in LDL++ using avg rather than count.
The above LDL++ code computes a function overlap. This function
computes the average value of W. W is assigned 0 or 1 for every
pair of tuples: (X,Ct) from table citizen and
(Ct,L) from table official_language. It assigns 1 to W
if speaks(X,L) is true for that pair of tuple else assigns 0. Now
C is the ratio of the count of pair of such tuples for which
speaks is true to the count of all the pairs. Which could be expressed in
terms of W as simply avg<W>. Hence the above code
computes C
using average.
Since averages can be estimated, this can be computed using online aggregation.