DB Scan Reduction
- partitioning (see Partitioning section)
- dynamic itemset counting
- pruning via support lower bounding (see DHP section)
- tree projection
- H-miner (h-tree)
- hypercube decomposition
This involves splitting the dataset into partitions. Traditionally, a partition is sized such that it fits in main memory, and then processing partitions in series, but in a modern context one can also process partitions in parallel on multiple machines.
This method uses the property that any frequent itemset in the whole dataset must also be frequent in at least one partition. The property of frequency is relative to dataset size, so as the dataset under consideration shrinks, the absolute support required to be frequent also shrinks.
- Split the dataset into partitions
- Mine each partition for candidate frequent itemsets
- Scan full database to confirm frequency of itemsets
Direct Hashing and Pruning
Here, one maintains a hash table whose buckets contain candidate itemsets, mapped to the sum of the supports of the bucket's contents. After processing the dataset, if a bucket's total support is not frequent, then no itemset within the bucket can be frequent.