[Trigger warning: police brutality and a little cold calculus]

Now that we've entered the age of ubiquitous fast video cameras, we are discovering that the police are pretty much getting up to all the shenanigans we've long suspected of them but could never prove. While each individual incident is terrible, in aggregate this increased visibility can only be good for the body public.

A cold and cynical (need better) side of me looks at the vague undercurrent of fear that these mostly-random outbursts of unwarranted violence creates in citizen interactions with the police, and wonders if a little of it might have ended up helpful for maintaining civil order with our relatively, ahem, fractious population. Actual crime rates are crazy low, whatever other impression the media strives to give, so one assumes the cops are doing something right at least.

Obviously, prior to all these cameras the bad behavior was more anecdotal than explicit, but it still created a little frisson of fear to keep the rabble in line. However, nowadays a little goes a lot further when it all ends up on national news. It's gotta be a really hard adjustment for cops out there to rein it in. All the rules have changed suddenly, a tectonic shift rather than the relatively more gentle shifts that prior technological changes have caused.

I can imagine how this situation suddenly appearing, with cops going to jail who would have easily lied their way out previously, can only make cops more paranoid and adversarial, potentially causing a negative feedback loop. That would really suck.

I remember watching a computer simulation of the honesty of public officials, which demonstrated how just a few individuals switching from relatively corrupt to honest can cause sudden sea changes in the vast majority of the population. One wonders if these cameras might serve as that nudge that will suddenly shift policing in America back to generally being in league with the community instead of acting as embattled foes.


Pattern Discovery - FPGrowth Database Projection

Handling Large FP-Trees

Sometimes the FP-Trees generated by FPGrowth are too large to fit in memory. There are two methods of breaking the work into smaller pieces, one which can be mined in parallel at a cost of high storage consumption, and one which requires serial processing but uses less extra storage. These techniques are collectively known as database projection.

Both methods start with the standard first pass, where a list of frequent 1-itemsets is generated along with their supports into an f-list.

The second step is also common to both: for each frequent item, create a projected database associated with that item as the database's prefix.

If these methods are used recursively, each child projected database will accumulate a longer prefix by combining its associated item with its parent's prefix, and have an f-list generated solely from the contents of the child database.

Parallel Projection

Perform a second scan. For each transaction:

  • Discard infrequent items from the itemset
  • For each item in the f-list, starting with the least frequent, project a sub-itemset into the associated projection database composed of all items that are more frequent than the item under consideration
Mine the projected databases in parallel and combine the results.

Partition Projection

Perform a second scan. For each transaction:
  • Discard infrequent items from the itemset
  • Remove the least frequent item from the itemset
  • Project that transaction's itemset only into the projection database associated with the removed item.
Mine each partition database in series, starting with the database associated with the least frequent item. While processing each transaction in the database under consideration, re-project the transaction's itemset as per the method used in the initial scan.

Pattern Discovery - FPGrowth


FPGrowth - Frequent Pattern Growth

This method is much more complex than Apriori or ECLAT, trading computational time and larger data structures for a vastly reduced number of dataset scans. It works by creating tree datastructures called FP-Trees using only two scans of the database, and all further mining is performed directly on the trees. FPGrowth outputs closed patterns.


An FP-Tree is a tree structure starting with a root "null" node, and whose other nodes contain an item and its support. Each node may have an arbitrary number of children. Items may appear in multiple branches, but only once in the same branch. Each distinct branch represents an itemset or subset as found in one or more transactions in the database. During processing, the tree is walked both down towards the tips and up towards the node.

Each tree is accompanied by a collection of node lists, one list per distinct item in the dataset. Each list contains all of the associated item's nodes within the tree.

FPGrowth Algorithm

This algorithm is recursive, in that after mining a tree, sub-trees are generated from the mining results and mined in turn until the resulting sub-trees are null or contain only one branch.

To generate an FP-Tree:
  • Generate a list of frequent 1-itemsets in the dataset, and their support. This is called the f-list.

    Different orderings for this list can affect the space-efficiency of the trees. Typical order is by descending support (most to least frequent), and the relative order of items with the same support is irrelevant. Other orderings may be more appropriate for some datasets.

    For the primary dataset, an item's support is obviously incremented by 1 for each transaction it appears in, as usual, but the transactions of sub-datasets can have a larger support, and the item's support is then incremented by that value instead.
  • Initialize the node list collection from the f-list.
  • For each transaction in the dataset:
    • Sort the itemset so its items' relative position matches those of the f-list.
    • Create a branch of tree nodes based on the items, where each node's support is set to that of the transaction. For the primary tree this is 1, but during recursion sub-datasets may contain transactions with higher support.
    • Graft the branch onto the FP-Tree:
      • If the branch shares a prefix with any existing branch, merge the prefix nodes by incrementing each existing node's support by that of the branch node.
      • Graft the remaining branch to the tree as a child of the lowest duplicate node. If no shared prefix is found, this would be the root node.
      • For each branch node which was grafted (not merged), add the node to the list associated with its item.
Now mine the tree for frequent itemsets. For each item in the f-list, working in reverse order:
  • Create a conditional dataset for that item. The dataset has an itemset prefix, composed of the dataset's associated item and the prefix of its parent dataset. This prefix is obviously of length 1 for datasets derived from the primary tree, as there is no parent.
  • Generate a pool of conditional itemsets for that item, by walking the item's node list. Each node in the list results in an itemset whose support is equal to that of the node. That itemset is generated by walking the tree backwards to the root, adding each node's item to it, starting with the parent of the node under consideration. The conditional dataset's own associated item thus does not appear in the resulting itemsets.
For each conditional dataset, generate an FP-Tree using the same algorithm as the primary tree, treating each itemset as a transaction and keeping in mind those itemsets with a support greater than one. Continue mining each tree recursively in depth-first order, until the resulting sub-trees are empty or contain only one branch.

For each branch-only subtree, generate a list of itemsets by permuting all items in the branch and unioning each result with the partition's prefix. The resulting itemsets all have support of that of the prefix. These resulting itemsets when gathered together are the output of the algorithm.

Pattern Discovery - ECLAT

ECLAT: Equivalence Class Transformation

This mining method transforms a horizontal database mapping transaction IDs to itemsets, into a vertical database mapping items to the TIDs that item appears in. This format makes calculating the support for a particular itemset easy, as one simply counts the number of TIDs associated with that itemset.

It is very similar to the Apriori algorithm in that one starts by generating a list of frequent 1-itemsets and then iteratively using the list of frequent k-itemsets to calculate the list of frequent (k+1)-itemsets, resulting a dataset scan for each k, except that instead of retaining the support for each k-itemset one retains the TID list.

TID List Calculation for k-itemset

To calculate this list, generate the intersection (not union) of the TID lists for each pair of (k-1)-itemsets and discard any whose resulting TID list is less than minsup:

t(X) = {1, 2, 3, 4} ; t(Y) = {1, 2, 5, 6} ; Z = X ∪ Y ; t(Z) = t(X) ∩ t(Y) = {1, 2}

If minsup = 2, then Z is frequent; if minsup = 3, not frequent.

Beneficial Properties of TID Lists

If t(X) = t(Y) then X and Y are correlated

If t(X) ⊃ t(Y) then X always implies Y

TID Diffsets

Under certain circumstances, rather than storing the full TID lists for each itemset, one can store the delta between lists:

t(X) = {1, 2, 3, 4} ; t(Y) = {1, 2, 5, 6} ; ds(X, Y) = {5, 6} ; ds(Y, X) = {3, 4}

Note that ds(X, Y) != ds(Y, X) (unless both are ∅, haha)

Pattern Discovery - Apriori Refinements

The basic Apriori algorithm is much better than brute force, but can still be made more efficient. First, an outline of some methods, most of which I have no idea what they mean but were included in the slides and could be researched later, assuming Apriori is at all relevant in the modern context.

DB Scan Reduction

  • partitioning (see Partitioning section)
  • dynamic itemset counting

Candidate Reduction

  • hashing
  • sampling
  • pruning via support lower bounding (see DHP section)


  • tree projection
  • H-miner (h-tree)
  • hypercube decomposition

This involves splitting the dataset into partitions. Traditionally, a partition is sized such that it fits in main memory, and then processing partitions in series, but in a modern context one can also process partitions in parallel on multiple machines.

This method uses the property that any frequent itemset in the whole dataset must also be frequent in at least one partition. The property of frequency is relative to dataset size, so as the dataset under consideration shrinks, the absolute support required to be frequent also shrinks.

The algorithm:

  • Split the dataset into partitions
  • Mine each partition for candidate frequent itemsets
  • Scan full database to confirm frequency of itemsets
Naively, one may wonder why one could not simply add the support of frequent itemsets from each partition to calculate global frequency, but this does not work as this would only find frequent itemsets that are frequent in ALL partitions, because of the property of frequency being relative to dataset size.

Direct Hashing and Pruning

Here, one maintains a hash table whose buckets contain candidate itemsets, mapped to the sum of the supports of the bucket's contents. After processing the dataset, if a bucket's total support is not frequent, then no itemset within the bucket can be frequent.

Pattern Discovery - Apriori Algorithm

Mining an entire dataset for patterns using naive brute-force is extremely time-consuming and probably consists of exponential complexity. Various algorithms exist which are far more efficient than a brute-force search. The first under consideration is the Apriori algorithm.


The name derives from the latin phrase "a priori" ("from the earlier"), because we use knowledge derived from earlier itemsets to more efficiently evaluate each itemset.

Downward Closure Property

The efficiency derives from the property that, for itemset X to be frequent, all possible subsets Y must also be frequent. That is, for every pattern Y ⊂ X, s(Y) >= s(X). Inverted, any itemset X cannot be frequent if any pattern Y ⊂ X is not also frequent, and can be ignored.

This is based on the

Algorithm Definition

The core of the Apriori algorithm is to use the list of frequent k-itemsets to explore potentially frequent (k+1)-itemsets. This involves a full scan of the database for each value of k until no more frequent itemsets are found.
  • Set k = 1
  • Set p = ∅, f = ∅, c = ∅
  • Mine set of frequent 1-itemsets as f
    • Loop:
      • Generate list of candidate (k+1)-itemsets as c by multiplying f by itself (calculate all supersets composed of all combinations of itemsets in f, whose length is k+1)
      • Examine each (k+1)-itemset in c and discard any containing non-frequent k-itemsets
      • If c = ∅, break
      • Calculate actual frequency for each candidate itemset in c by scanning dataset
      • Discard any non-frequent candidate itemset
      • p += c
      • f := c
    • Return p as list of all frequent itemsets
    Efficient Candidate Generation

    This section uses Python list slicing notation to denote extracting a specific subset from an itemset.

    A quick method to generate a set of candidate (k+1)-itemsets as C, from a set of k-itemsets F:

    • Sort all itemsets lexically
    • For each itemset in F as A
      • For each itemset in F as B
        • If A[:-1] = B[:-1] and A[-1] < B[-1]
          • P = A ∪ B (preserving lexical sort order)
          • If P[1:] is in F, add P to C

    Pattern Discovery - Compressed Representations

    When mining patterns from a large dataset, the number of distinct patterns grows in a combinatorial explosion, and can quickly overwhelm storage and processing capacity.

    So, instead of storing all patterns, we can instead store a single pattern which implies a whole collection of sub-patterns.

    Specifically, if pattern X is frequent, then all patterns Y ⊂ X (pattern Y is a subset of X) are also frequent.

    Closed Pattern

    Pattern X is closed when it is frequent and there exists no subpattern Y whose support is the same as X. That is, no itemset Y ⊂ X where s(X) = s(Y).

    This is a lossless compression method, as subpatterns with differing support are preserved as separate patterns.


    Pattern X is maximal when it is frequent and there exists no frequent pattern Y ⊂ X. It is similar to a closed pattern except the relative support of subpatterns is ignored.

    This is a lossy compression method, as the differing support of subpatterns is lost. One uses max-pattern compression when one does not care about specific supports, only that the patterns are frequent.