Frequent Itemsets

Frequent Itemsets

1. Association rule discovery

1.1. Overview

1.2. The market-basket model

1.3. Example applications


2. Definition of a frequent itemsets.

2.1. Definition

2.2. Example of frequent itemsets

2.3. Association rules

2.4. Interesting association rules

2.5. Finding association rules

2.6. Mining association rules

2.7. Example

</details>

<summary>$m \rightarrow b$</summary>

</details>

<summary>$b \rightarrow c$</summary>

</details>

<summary>$c \rightarrow b$</summary>

</details>

<summary>$b \rightarrow c,m$</summary>

</details>

<summary>$b,c \rightarrow m$</summary>

</details>

<summary>$b,m \rightarrow c$</summary>

</details>


3. Finding frequent itemsets

3.1. Computation model

3.2. Computational cost

3.3. Main-memory bottleneck

3.4. Finding frequent pairs

3.5. Naive approach

3.6. Counting pairs in memory

Approach 1: Count all pairs using a matrix
  • Requires 4 bytes per pair
  • If n = total number items
    • Count pair of items {i, j} only if $i \leq j$
    • Keep pair counts in lexicographic order:
      • {1,2}, {1,3},…, {1,n}, {2,3}, {2,4},…,{2,n}, {3,4},…
    • Pair {i, j} is at position (i –1)(n – i/2) + j –1
    • Total number of pairs n(n –1)/2; total bytes= $2n^2$
Approach 2: Keep a table of triples [i, j, c]
  • The count of the pair of items {i, j} is c.
  • If integers and item ids are 4 bytes, we need approximately 12 bytes for pairs with count > 0
  • Plus some additional overhead for the hashtable
  • but only for pairs with count > 0
  • Beats Approach 1 if less than 1/3 of possible pairs actually occur

4. A-Priori algorithm.

4.1. Overview

4.2. A-Priori algorithm

4.3. Frequent triples

4.4. Example


5. PCY (Park-Chen-Yu) Algorithm

5.1. Observation

5.2. PCY Algorithm - First Pass

1
2
3
4
5
6
FOR (each basket) :
    FOR (each item in the basket) :
        add 1 to item’s count;
    FOR (each pair of items) :
        hash the pair to a bucket;
        add 1 to the count for that bucket;

5.3. PCY Algorithm - Second Pass


6. SON: Savasere-Omiecinski-Navathe

6.1. Overview

6.2. SON: MapReduce Phase 1

6.3. SON: MapReduce Phase 2