The single and multi rules

User-Defined Aggregates

newaggr

single(newaggr, Y, NV) and multi(newaggr, Y, OV, NV).

where

single is the computation to be performed on a singleton set having Y as its only element.
multi denotes how to compute the aggregate value NV for a set S' obtained by adding a new element Y to the the set S, where the value of the aggregate for S is OV.

max

    single(max, Y, Y).
    multi(max, Y, MO, MN) <-  Y > MO, MN=Y.
    multi(max, Y, MO, MN) <-  Y <=  MO, MN=MO.

count

sum

   single(count,  Y, 1).
   multi(count, Y, Old, New) <- New= Old+1.

   single(sum, Y, Y).
   multi(sum,  Y, Old, New) <- New= Old+Y.

count

sum

count_all

sum_all

Old

Y

User-defined aggregates can also be called by means of aggr goals. In this case, when applied to the empty set, the compiler will search for empty rule defining the behavior of that particular aggregate on an empty set. For instance, the our built-in aggregates behave as if they were defined by the following rules:

empty(sum, 0).
empty(count, 0).
empty(max, 0) <- false.

aggr

0

Several new aggregates can be defined using the single and multi rules. For instance in SQL, after the maximum is found, a second sub-query is needed to return all the values associated with the maximum. In LDL++, if sppp denotes a supplier-part-price relation, to find, for each supplier their most expensive items and their common price of these items, we can write:

findmax(S, mymax<(Itm,Pric)>) <- sppp(S, Itm, Pric). 

single(mymax, (Item, Pr), (Item, Pr)).

multi(mymax, (Sit,Sp),(Oit,Op), (Sit, Sp)) <- Sp >= Op.
multi(mymax, (Sit,Sp),(Oit,Op), (Oit, Op)) <- Sp < Op.

Aggregates can return full tuples, such as the pair (Item, Pr) produced by mymax
More than one value/tuple can be returned from the computation of an aggregate.

Return Rules

Values are not returned until the end of the computation. To express on-line aggregates, temporal aggregates on time series, and a number of aggregates required for data mining, values must be returned while the computation is still progressing.
The aggregate returns values in one column. Often we want to return values in several columns.

ereturn

freturn

ereturn

multi

ereturn(newaggr, NewY, OldV, VR) <- ...

multi

ereturn

multi

Early Returns

Find suppliers who supply more than 7 items

   select(Sup) <- allcounts(Sup, CC), CC>7 .  
   allcounts(Sup, cntol<Itm>) <- sppp(Sup, Itm, Price).

cntol

   single(cntol, _, 1).
   multi(cntol, S, Old, New) <- New= Old+1. 
   ereturn(cntol, S, Old, Value) <- Old ~= nil, Value=Old+1.

Old

nil

Old+1,

Old ~= nil

mychoice:

  single(mychoice, Y, Y).
  multi(mychoice,Y, nil, nil) <- fail.
  ereturn(mychoice, Y, nil, Y).

   p(X, Y) <- q(X, Y), choice((X), Y).
   p(X, mychoice<Y>) <- q(X, Y).

nil

The aggregate avg could have been defined as follows:

single(avg, X, (X,1)).
multi(avg,  X, (OS,OC), (NS,NC)) <- NS=OS+X, NC=OC+1.
freturn(avg, nil,(OS,OC), Avg) <- Avg= OS/OC.

nil

Using ereturn rules, an assortment of very useful aggregates, e.g., those used for data mining applications, can be defined. Moving window aggregates, for instance, are of common usage in time-series analysis.

Example. Moving time window aggregation : Average the prices of IBM stocks over the last five days.

   p(mw5avg<A>) <- stock-closing('IBM',A).

   single(mw5avg, X, [X]).
   multi(mw5avg, X, OL, NL) <- if(OL= [X1, X2, X3, X4, X5]
                               then  L = [X1, X2, X3, X4] 
                               else  L= OL), NL = [X | L].
   ereturn(mw5avg, _,[X5,X4,X3,X2,X1],Avg) <- 
                                  Avg= (X1+X2+X3+X4+X5)/5.

Arbitrary Number of Columns

   findmax(S, maxtwo<(Itm,Pric)>) <- sppp(S, Itm, Pric). 

   single(maxtwo, (Item, Pr), (Item, Pr)).

   multi(maxtwo, (Sit,Sp),(Oit,Op), (Sit, Sp)) <- Sp >= Op.
   multi(maxtwo, (Sit,Sp),(Oit,Op), (Oit, Op)) <- Sp <= Op.

   ereturn(maxtwo, nil, (Sit, Sp), Sit, Sp).

Any additional argument is returned in a separate column

findmax

nil

When the heads of return rules only have three arguments, this is boolean aggregate,

   single(count7, _, 1).
   multi(count7, _, Old, New) <- Old<7,  New=Old+1.
   ereturn(count7, _, Old)  <- Old=7.

   select(Sup, count7<Itm>) <- sppp(Sup, Itm, Price).

Multiple Aggregates in the Head

p(K1,K2,...,Km, aggr1<A1>, aggr2<A2>, ..., aggrN<An>) <- Rule Body.

The arguments in the head, that is, K1, ...,Km and A1,...,An must all appear in the body of the rule.
aggr1,...,aggrN can either be builtin aggregates or user-defined aggregates.
Each aggr1<A1>, aggr2<A2>, ..., aggrN<An> is grouped by K1,K2,...,Km where m denote a non-negative integer (thus an empty group-by list is also allowed).
The cartesian product of the results of aggr1, ...,aggrN will be returned for each new value X. Thus, if any of the N aggregates fail for a given X no value is returned at that point.

Monotone Aggregation

nil

However, aggregates that have early return rules and no final return rules are monotonic. These aggregates can be used in recursive programs without restrictions. This leads to the simpler expression of complex algorithms.

Suppose we define a count-like predicate mcount as follows:

   single(mcount, Y,1).
   multi(mcount, Y, Old, New) <-  New=Old+1.
   ereturn(mcount, Y, Old, New) <- if(Old=nil then New=1 
                                     else New=Old+1).

Old=nil

p

   q(mcount<X>) <- p(X).

Program rules with mcount define monotone deterministic mappings.

mcount

Join the Party: Some people will come to the party no matter what, and their names are stored in a sure(Person) relation. But many other persons will join only after they know that at least K of their friends will be there. Here, friend(A, B) denotes that A views B as a friend.

   willcome(P)<-  sure(P).
   willcome(P)<-  c_friends(P, K), K >= 3.
   c_friends(P, mcount<F>) <-  willcome(F), friend(P, F).

K=3

By specializing the count aggregate, we can further improve the efficiency of the computation. Let us define an aggregate kcount as follows:

   single(kcount,(K,Y),1).
   multi(kcount,(K,Y),Old,New) <- Old<K, New=Old+1. 
   ereturn(kcount,(K,Y),K1,yes) <- K1+1=K.

ereturn

K

k>1

multi

   wllcm(F,yes) <- sure(F).
   wllcm(X,kcount<(3,F)>) <- wllcm(F,_), friend(X,F).

c_friends

friend

wllcm

yes

  single(zcount, (K,X), 1).
  multi(zcount,  (K,X), Old, New) <- Old < K, New=Old+1.
  ereturn(zcount, (K,X), K1) <- K1~=nil, K=K1+1.

  wllcom(F) <- sure(F).
  wllcom(X, zcount<(3,F)>) <- wllcom(F), friend(X, F).

msum

mmin

For msum we have:

   single(msum, Y, Y).
   multi(msum, Y, Old, New) <-  New = Old + Y.
   ereturn(msum, Y, Old, New)  <- if(Old = nil then New=Y 
                                    else New=Old+1).

mmin, we will return the last value if this is a new min.

   single(mmin, Y,Y).
   multi(mmin, Y, Old,New)  <-  if(Y < Old  then  New=Y  
                                   else  New=Old).
   ereturn(mmin, Y, Old, Y)  <-  if(Old ~= nil then Y < Old).

Least-Distance Connections:

g(X,Y, C)

C

X

Y

   ld(X, Y, mmin<C>) <-  g(X,Y, C).
   ld(X, Y, mmin<C>) <-  ld(X,Z, C1), 
                         ld(Z, Y, C2), C= C1+C2. 
   least_dist(X, Y, min<C>) <- ld(X,Z, C1).

ld(X, Y, C)

X

Y

least_dist

C

ld

least_dist

Company Control: Another interesting example is transitive ownership and control of corporations. Say that owns(C1, C2, Per) denotes that corporation C1 owns a percentace Per of the shares of corporation C2. Then, C1 controls C2 if it owns more than, say, 49% of its shares. In general, to decide whether C1 controls C3 we must also add the shares owned by corporations such as C2 that are controlled by C1. This yields the transitive control predicate defined as follows:

   control(C, C) <- owns(C, _, _).
   control(C1, C2) <- twons(C1, C2, Per), Per>49.
   towns(C1, C3, msum<Per>) <- contrl(C1, C2), 
                                  owns(C2, C3, Per).

C1

C2

twons

msum

(C2,C3)

control

C2

C1

C3

   single(sum49, Y, Y).  
   multi(sum49, Y, Old, Z) <-  Old<49, Z= Old+Y. 
   ereturn(sum49, Y, Old) <- if(Old=nil then Y>49 
                               else Old+Y>49).

   cntrl(C1, C2) <- owns(C1, C2, Per), Per >49.
   cntrl(C1, C3,sum49<Per>) <- cntrl(C1,C2), 
                                 owns(C2,C3,Per).

sum49

kcount

Bill-of-Materials (BoM) Applications: BoM applications represent an important application area that requires aggregates in recursive rules. Say, for instance that psb(P1, P2, QT) denotes that P1 contains part P2 in quantity QT. We also have elementary parts that are purchasable for a price and will be delivered in a certain number of days: these are described by the relation basic(P, Price, Days). Then, the following program computes the cost of a part as the sum of the cost of the basic parts it contains.

  part_cost(Part, O, Cst) <-  basic(Part, Cst).
  part_cost(Part, mcount<Sb>, msum<MCst>) <-
                 part_cost(Sb,ChC,Cst), prolfc(Sb,ChC),
                 psb(part,Sb,Mult), MCst=Cst*Mult.

Sb

part_cost

Sb

mcount

psb

prolfc

   prolfc(P1, 0) <-        basic(P1, _).
   prolfc(P1, count<P2>)<- psb(P, P2, _).

zcount

   pcost(Part, Cost)  <-  basic(Part, Cost).
   pcost(Part, zcount<(K,Sb)>, msum<Cst>) <-
                               pcost(Sb, yes, Cst), 
                               psb(Part, Sb, Mult),
                               prolfc(Part, K), 
                               MCst=Cst*Mult.

prolfc

Part

least_dist

Carlo Zaniolo, 1998