**Apriori**

The name derives from the latin phrase "a priori" ("from the earlier"), because we use knowledge derived from earlier itemsets to more efficiently evaluate each itemset.

**Downward Closure Property**

The efficiency derives from the property that, for itemset X to be frequent, all possible subsets Y

__must also__be frequent. That is, for every pattern Y ⊂ X, s(Y) >= s(X). Inverted, any itemset X

__cannot__be frequent if any pattern Y ⊂ X is not also frequent, and can be ignored.

This is based on the

**Algorithm Definition**

The core of the Apriori algorithm is to use the list of frequent k-itemsets to explore potentially frequent (k+1)-itemsets. This involves a full scan of the database for each value of k until no more frequent itemsets are found.

- Set k = 1
- Set p = ∅, f = ∅, c = ∅
- Mine set of frequent 1-itemsets as f
- Loop:
- Generate list of candidate (k+1)-itemsets as c by multiplying f by itself (calculate all supersets composed of all combinations of itemsets in f, whose length is k+1)
- Examine each (k+1)-itemset in c and discard any containing non-frequent k-itemsets
- If c = ∅, break
- Calculate actual frequency for each candidate itemset in c by scanning dataset
- Discard any non-frequent candidate itemset
- p += c
- f := c
- Return p as list of all frequent itemsets

**Efficient Candidate Generation**

This section uses Python list slicing notation to denote extracting a specific subset from an itemset.

A quick method to generate a set of candidate (k+1)-itemsets as C, from a set of k-itemsets F:

- Sort all itemsets lexically
- For each itemset in F as A
- For each itemset in F as B
- If A[:-1] = B[:-1] and A[-1] < B[-1]
- P = A ∪ B (preserving lexical sort order)
- If P[1:] is in F, add P to C