items. e.g.: things sold in a supermarket.baskets baskets cannot fit into memory.association rule: tricks, e.g., run sale on diapers and raise the price of beerin basketsfrequent.s: support threshold.I is a set of items. support for I is the number of baskets in which I is a subset.I is frequent if its support is s or higher.s = 3 (items appear in at least three baskets)If-then rules about the contents of baskets.Confidence of this association rule is the probability of having item $j$ in a basket given that basket already contained items ${i_1,…,i_k}$. | Association rule interest is: $ | 0.5-5/8 | = 0.125$ |
| For every subset A of I, generate a rule $A \rightarrow I | A$ |
Based on these itemsets, we can generate the following rules:
</details>
| Interest = $$ | 0.8-0.75 | = 0.05$ (not very interesting!) |
</details>
| Interest = $$ | 0.6667-0.5 | = 0.1667$ (not very interesting!) |
</details>
| Interest = $$ | 0.8-0.75 | = 0.15$ (not very interesting!) |
</details>
| Interest = $$ | 0.3333-0.25 | = 0.08$ (not very interesting!)x |
</details>
| Interest = $$ | 0.5-0.625 | = 0.125$ (not very interesting!) |
</details>
| Interest = $$ | 0.5-0.5 | = 0$ (not very interesting!) |
</details>
monotonicity s times, so does every subset J of I.i does not appear in s baskets, then no pair containing i can appear in s baskets.s times - frequent items. frequent.
count for each bucket into which pairs of items are hashed
1
2
3
4
5
6
FOR (each basket) :
FOR (each item in the basket) :
add 1 to item’s count;
FOR (each pair of items) :
hash the pair to a bucket;
add 1 to the count for that bucket;
s (support) timess (call it a frequent bucket); 0 means it did not.
a-priori on these subsets, using a support that is equal to the main support divided by the total numbers of subsets.s are frequent in the whole dataset, so the Reduce task outputs these itemsets.