Navigation

2015/02/19

Pattern Discovery - ECLAT

ECLAT: Equivalence Class Transformation

This mining method transforms a horizontal database mapping transaction IDs to itemsets, into a vertical database mapping items to the TIDs that item appears in. This format makes calculating the support for a particular itemset easy, as one simply counts the number of TIDs associated with that itemset.

It is very similar to the Apriori algorithm in that one starts by generating a list of frequent 1-itemsets and then iteratively using the list of frequent k-itemsets to calculate the list of frequent (k+1)-itemsets, resulting a dataset scan for each k, except that instead of retaining the support for each k-itemset one retains the TID list.

TID List Calculation for k-itemset

To calculate this list, generate the intersection (not union) of the TID lists for each pair of (k-1)-itemsets and discard any whose resulting TID list is less than minsup:

t(X) = {1, 2, 3, 4} ; t(Y) = {1, 2, 5, 6} ; Z = X ∪ Y ; t(Z) = t(X) ∩ t(Y) = {1, 2}

If minsup = 2, then Z is frequent; if minsup = 3, not frequent.

Beneficial Properties of TID Lists

If t(X) = t(Y) then X and Y are correlated

If t(X) ⊃ t(Y) then X always implies Y

TID Diffsets

Under certain circumstances, rather than storing the full TID lists for each itemset, one can store the delta between lists:

t(X) = {1, 2, 3, 4} ; t(Y) = {1, 2, 5, 6} ; ds(X, Y) = {5, 6} ; ds(Y, X) = {3, 4}

Note that ds(X, Y) != ds(Y, X) (unless both are ∅, haha)