Recommendation Systems

Recommendation Systems

1. Recommendations

1.1. Overview

1.2. Types of recommendations

1.3. Formal Model

  Avatar LOTR Matrix Pirates
Alice 1   0.2  
Bob   0.5   0.3
Carol 0.2   1  
David       0.4

???question “Key Problems” - Gathering known ratings for matrix - How to collect the data in the utility matrix - Extrapolate unknown ratings from the known ones - Mainly interested in high unknown ratings - We are not interested in knowing what you don’t like but what you like - Evaluating extrapolation methods - How to measure success/performance of recommendation methods

???info “Gathering ratings” - Explicit - Ask people to rate items - Doesn’t work well in practice: people can’t be bothered - Implicit - Learn ratings from user actions - E.g., purchase implies high rating - What about low ratings?

???info “Extrapolating utilities” - Key problem: Utility matrix U is sparse - Most people have not rated most items - Cold start: - New items have no ratings - New users have no history - Three approaches to recommender systems: - Content-based - Collaborative - Latent factor based


2. Content-based Recommender Systems

2.1. Main Idea

2.2. Item profiles

2.3. User profiles and prediction

2.4. Pros and cons


3. Collaborative Filtering

3.1. Overview

3.2. How do we find similar users

3.3. Example differences of similarity metrics

  HP1 HP2 HP3 TW SW1 SW2 SW3
A 4     5 1    
B 5 5 4        
C       2 4 5  
D   3         3

???info “Immediate intuition” A is more similar to B than A is to C

???failure “Jaccard similarity” - $r_A={1,0,0,1,1,0,0}$ - $r_B={1,1,1,0,0,0,0}$ - $r_C={0,0,0,1,1,1,0}$ - $sim(A,B) = \frac{1}{5}$ - $sim(A,C) = \frac{2}{4}$ - A is more similar to C than to B!!!

???warning “Cosine similarity” - $r_A={4,0,0,5,1,0,0}$ - $r_B={5,5,4,0,0,0,0}$ - $r_C={0,0,0,2,4,5,0}$

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$$
    sim(x,y)=cos(r_x,r_y)=\frac{r_x \cdot r_y}{||r_x|| \cdot ||r_y||}
$$

$$
    sim(A,B)=\frac{r_A \cdot r_B}{||r_A|| \cdot ||r_B||}=\frac{4 \times 5 + 0 \times 5 + 0 \times 4 + 5 \times 0 + 1 \times 0 + 0 \times 0 + 0 \times 0 }{\sqrt{4^2+5^2+1^2} \times \sqrt{5^2+5^2+4^2}}=0.380
$$

$$
    sim(A,C)=\frac{r_A \cdot r_C}{||r_A|| \cdot ||r_C||}=\frac{5 \times 2 + 1 \times 4}{\sqrt{4^2+5^2+1^2} \times \sqrt{2^2+4^2+5^2}}=0.322
$$

- A is more similar to B than to C!
- What else?

???success “Normalization” - Sutract the row mean - This normalizes data and turns cosine similarity into Pearson correlation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$$
    sim(x,y)=cos(r_x,r_y)=\frac{r_x \cdot r_y}{||r_x|| \cdot ||r_y||}
$$

$$
    sim(x,y) = \frac{\sum_{s \in S_{xy}}(r_{xs} - \overline{r_x})(r_{ys} - \overline{r_y})}{\sqrt{\sum_{s \in S_{xy}}(r_{xs} - \overline{r_x})^2}\sqrt{\sum_{s \in S_{xy}}(r_{ys} - \overline{r_y})^2}}
$$

- $r_A={4,0,0,5,1,0,0}$ and $\overline{r_A}=\frac{10}{3}$
- $r_B={5,5,4,0,0,0,0}$ and $\overline{r_B}=\frac{14}{3}$
- $r_C={0,0,0,2,4,5,0}$ and $\overline{r_C}=\frac{11}{3}$
- $r_D={0,3,0,0,0,0,3}$ and $\overline{r_D}=3$


|   | HP1 | HP2 | HP3  | TW   | SW1  | SW2 | SW3 |
| - | --- | --- | ---- | ---- | ---- | --- | --- | 
| A | 2/3 |     |      | 5/3  | -7/3 |     |     |
| B | 1/3 | 1/3 | -2/3 |      |      |     |     |
| C |     |     |      | -5/3 | 1/3  | 4/3 |     |
| D |     | 0   |      |      |      |     |  0  |

- sim(A,B)=0.092
- sim(A,C)=-0.559

- A is more similar to B than to C!
- Because the value is also the correlation value:
    - A is very similar to B
    - A is very different from C

3.4. Rating Predictions

\[r_{xi} = \frac{1}{k}\sum_{y \in N}r_{yi}\] \[r_{xi} = \frac{\sum_{y \in N}s_{xy} \cdot r_{yi}}{\sum_{y \in N}s_{xy}}\]

3.5. Item-item collaborative filtering

\[r_{xi} = \frac{\sum_{j \in N(i;x)}s_{ij} \cdot r_{xj}}{\sum_{j \in N(i;x)}s_{ij}}\]

3.6. Example: item to item

  $u_1$ $u_2$ $u_3$ $u_4$ $u_5$ $u_6$ $u_7$ $u_8$ $u_9$ $u_{10}$ $u_{11}$ $u_{12}$
$m_1$ 1   3     5     5   4  
$m_2$     5 4     4     2 1 3
$m_3$ 2 4   1 2   3   4 3 5  
$m_4$   2 4   5     4     2  
$m_5$     4 3 4 2         2 5
$m_6$ 1   3   3     2     4  

???info “Estimate rating of movie 1 by user 5”

1
2
3
4
5
6
7
8
|       | $u_1$ | $u_2$ | $u_3$ | $u_4$ | $u_5$ | $u_6$ | $u_7$ | $u_8$ | $u_9$ | $u_{10}$ | $u_{11}$ | $u_{12}$ | 
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | -------- | -------- | -------- | 
| $m_1$ |   1   |       |   3   |       |   ??? |   5   |       |       |   5   |          |     4    |          |
| $m_2$ |       |       |   5   |    4  |       |       |   4   |       |       |     2    |     1    |     3    |
| $m_3$ |   2   |   4   |       |    1  |   2   |       |   3   |       |   4   |     3    |     5    |          | 
| $m_4$ |       |   2   |   4   |       |   5   |       |       |   4   |       |          |     2    |          |
| $m_5$ |       |       |   4   |    3  |   4   |   2   |       |       |       |          |     2    |     5    | 
| $m_6$ |   1   |       |   3   |       |   3   |       |       |   2   |       |          |     4    |          |

???info “Identify movies similar to movie 1, rated by user 5” - Pearson correlation for similarity calculation: - Movie 3 - Movie 5

1
2
3
4
5
6
7
8
|       | $u_1$ | $u_2$ | $u_3$ | $u_4$ | $u_5$ | $u_6$ | $u_7$ | $u_8$ | $u_9$ | $u_{10}$ | $u_{11}$ | $u_{12}$ | $sim(1,m_i)$ |
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | -------- | -------- | -------- | ------------ |
| $m_1$ |   1   |       |   3   |       |   ??? |   5   |       |       |   5   |          |     4    |          |       1.0    |
| $m_2$ |       |       |   5   |    4  |       |       |   4   |       |       |     2    |     1    |     3    |     -0.18    |
| $m_3$ |   2   |   4   |       |    1  |   2   |       |   3   |       |   4   |     3    |     5    |          |      0.41    |  
| $m_4$ |       |   2   |   4   |       |   5   |       |       |   4   |       |          |     2    |          |     -0.10    |
| $m_5$ |       |       |   4   |    3  |   4   |   2   |       |       |       |          |     2    |     5    |     -0.31    |
| $m_6$ |   1   |       |   3   |       |   3   |       |       |   2   |       |          |     4    |          |      0.59    |

???info “Similarity weights” - $s_{1,3} = 0.41$ - $s_{1,6} = 0.59$

1
2
3
4
5
6
7
8
|       | $u_1$ | $u_2$ | $u_3$ | $u_4$ | $u_5$ | $u_6$ | $u_7$ | $u_8$ | $u_9$ | $u_{10}$ | $u_{11}$ | $u_{12}$ | $sim(1,m_i)$ |
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | -------- | -------- | -------- | ------------ |
| $m_1$ |   1   |       |   3   |       |   ??? |   5   |       |       |   5   |          |     4    |          |       1.0    |
| $m_2$ |       |       |   5   |    4  |       |       |   4   |       |       |     2    |     1    |     3    |     -0.18    |
| $m_3$ |   2   |   4   |       |    1  |   2   |       |   3   |       |   4   |     3    |     5    |          |      0.41    |  
| $m_4$ |       |   2   |   4   |       |   5   |       |       |   4   |       |          |     2    |          |     -0.10    |
| $m_5$ |       |       |   4   |    3  |   4   |   2   |       |       |       |          |     2    |     5    |     -0.31    |
| $m_6$ |   1   |       |   3   |       |   3   |       |       |   2   |       |          |     4    |          |      0.59    |

???info “Predict rating” - $s_{1,3} = 0.41$ - $s_{1,6} = 0.59$ - $r_{1,5} = \frac{0.41 * 2 + 0.59 * 3}{0.41 + 0.59} = 2.6$

1
2
3
4
5
6
7
8
|       | $u_1$ | $u_2$ | $u_3$ | $u_4$ | $u_5$ | $u_6$ | $u_7$ | $u_8$ | $u_9$ | $u_{10}$ | $u_{11}$ | $u_{12}$ | $sim(1,m_i)$ |
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | -------- | -------- | -------- | ------------ |
| $m_1$ |   1   |       |   3   |       |   2.6 |   5   |       |       |   5   |          |     4    |          |       1.0    |
| $m_2$ |       |       |   5   |    4  |       |       |   4   |       |       |     2    |     1    |     3    |     -0.18    |
| $m_3$ |   2   |   4   |       |    1  |   2   |       |   3   |       |   4   |     3    |     5    |          |      0.41    |  
| $m_4$ |       |   2   |   4   |       |   5   |       |       |   4   |       |          |     2    |          |     -0.10    |
| $m_5$ |       |       |   4   |    3  |   4   |   2   |       |       |       |          |     2    |     5    |     -0.31    |
| $m_6$ |   1   |       |   3   |       |   3   |       |       |   2   |       |          |     4    |          |      0.59    |

3.7. CF: Common Practice

\[r_{xi} = b_{xi} + \frac{\sum_{j \in N(i;x)}s_{ij} \cdot (r_{xj} - b_{xj})}{\sum_{j \in N(i;x)}s_{ij}}\]

3.8. CF: Pros and cons

3.9. Evaluation

???tip “Add data” - Leverage all the data - Don’t try to reduce data size in an effort to make fancy algorithms work - Simple methods on large data do best

1
2
3
4
- Add more data
    - e.g., add IMDB data on genres

- [More data beats better algorithms](http://anand.typepad.com/datawocky/2008/03/more-data-usual.html)

4. The Netflix Prize

4.1. Training data

4.2. Winner: BellKor Recommender System

???example “Example: modeling local and global effects” - Global: - Mean movie rating: 3.7 stars - The Sixth Sense is 0.5 stars above avg. - Joe rates 0.2 stars below avg. - Baseline estimation: Joe will rate The Sixth Sense 4 stars - Local neighborhood (CF/NN): - Joe didn’t like related movie Signs - Final estimate: Joe will rate The Sixth Sense 3.8 stars

4.3. Interpolation weights

\(r_{xi} = b_{xi} + \frac{\sum_{j \in N(i;x)}s_{ij} \cdot (r_{xj} - b_{xj})}{\sum_{j \in N(i;x)}s_{ij}}\)

???question “Problems” - Similarity measures are “arbitrary” - Pairwise similarities neglect interdependencies among users - aking a weighted average can be restricting

???success “Solution” Instead of $s_{ij}$ use $w_{ij}$ that we estimate directly from data

1
2
3
4
5
6
7
8
9
10
11
$$
r_{xi} = b_{xi} + \sum_{j \in N(i;x)}w_{ij}(r_{xj} - b_{xj})
$$

- $N(i;x)$ is set of movies rated by user x that are similar to movie i
- $w_{ij}$ is the interpolation weight (some real number)
- $w_{ij}$ models interaction between pairs of movies (it does not depend on user $x$)
- Goals:
    - Find $w_{ij}$ that minimize SSE (sum of squared errors) on training data!
        - Models relationships between item i and its neighbors j
    - $w_{ij}$ can be learned/estimated based on x and all other users that rated i

4.4. Recommendations via Optimization

\[J(w)=\sum_{x,i} \left( \left[ b_{xi} + \sum_{j \in N(i;x)}w_{ij}(r_{xj} - b_{xj}) \right] - r_{xi} \right)^2\]

4.5. Performance

4.6. Latent factor models (singular value decomposition - SVD)”

  HP1 HP2 HP3 TW SW1 SW2 SW3
A 4     5 1    
B 5 5 4        
C       2 4 5  
D   3         3

4.7. Bringing everything together

\[r_{xi} = \mu + b_x + b_i + q_i \cdot p_x\]

4.8. Additional details

\[r_{xi} = \mu + b_x(t) + b_i(t) + q_i \cdot p_x\]

4.9. Still not there yet