Navigation

2015/02/19

Pattern Discovery - Compressed Representations

When mining patterns from a large dataset, the number of distinct patterns grows in a combinatorial explosion, and can quickly overwhelm storage and processing capacity.

So, instead of storing all patterns, we can instead store a single pattern which implies a whole collection of sub-patterns.

Specifically, if pattern X is frequent, then all patterns Y ⊂ X (pattern Y is a subset of X) are also frequent.

Closed Pattern

Pattern X is closed when it is frequent and there exists no subpattern Y whose support is the same as X. That is, no itemset Y ⊂ X where s(X) = s(Y).

This is a lossless compression method, as subpatterns with differing support are preserved as separate patterns.

Max-Pattern

Pattern X is maximal when it is frequent and there exists no frequent pattern Y ⊂ X. It is similar to a closed pattern except the relative support of subpatterns is ignored.

This is a lossy compression method, as the differing support of subpatterns is lost. One uses max-pattern compression when one does not care about specific supports, only that the patterns are frequent.