Mining for Associations

Next: Data Classification Up: Data Mining Previous: Data Mining

Mining for Associations

Data mining evolved as a set of application relating to discovery of relevant hidden information in databases, especially discovery of association rules [AS94,MPC96]. As an application is basket data analysis, where the data of purchased products (referred as items), is collected and associations between items are computed. An association rule is of the form where X and Y are two sets of items. The frequency of such an observation in data gives the strength of the rule. As an example for a probable rule:

the relevance could be measured by:

The closer C is to 1 the stronger the confidence in the rule. This could be computed in LDL++ using avg rather than count.

The above LDL++ code computes a function overlap. This function computes the average value of W. W is assigned 0 or 1 for every pair of tuples: (X,Ct) from table citizen and (Ct,L) from table official_language. It assigns 1 to W if speaks(X,L) is true for that pair of tuple else assigns 0. Now C is the ratio of the count of pair of such tuples for which speaks is true to the count of all the pairs. Which could be expressed in terms of W as simply avg<W>. Hence the above code computes C using average. Since averages can be estimated, this can be computed using online aggregation.

Punit Bhargava
Wed Mar 11 18:50:53 PST 1998