

graph TD
B(("B(38.4)")):::bWide --> C(("C(34.3)")):::cWide
C --> B
D(("D(3.9)")):::dWide --> A(("A(3.3)")):::aWide
D --> B
E(("E(8.1)")):::eWide --> D
E --> B
E --> F(("F(3.9)")):::dWide
F --> B
F --> E
G(("1.6")):::wide5 --> B
H(("1.6")):::wide5 --> C
I(("1.6")):::wide5 --> B
I --> E
K(("1.6")):::wide5 --> B
classDef aWide padding:10px
classDef bWide padding:38px
classDef cWide padding:34px
classDef aWide padding:10px
classDef eWide padding:15px
classDef wide5 padding: 5px
style A fill:#FFA500
style B fill:#00FFFF
style C fill:#8B8000
style D fill:#f198b3
style E fill:#ADEBB3
style F fill:#f198b3
graph TD
k((k)) -->|$$r_k/4$$| j((j))
i((i)) -->|$$r_i/3$$| j((j))
s1((" ")) --> i
i --> s2((" "))
i --> s3((" "))
s4((" ")) --> k
k --> s5((" "))
k --> s6((" "))
k --> s7((" "))
j -->|$$r_j/3$$| s8((" "))
j -->|$$r_j/3$$| s9((" "))
j -->|$$r_j/3$$| s10((" "))
s8 --> s81((" "))
s8 --> s82((" "))
s8 --> s83((" "))
s9 --> s91((" "))
s9 --> s92((" "))
s9 --> s93((" "))
s10 --> s101((" "))
s10 --> s102((" "))
s10 --> s103((" "))
style j fill:#FF0000
style i fill:#ADEBB3
style k fill:#ADEBB3
style s1 fill:#FFFFFF00, stroke:#FFFFFF00;
style s2 fill:#FFFFFF00, stroke:#FFFFFF00;
style s3 fill:#FFFFFF00, stroke:#FFFFFF00;
style s4 fill:#FFFFFF00, stroke:#FFFFFF00;
style s5 fill:#FFFFFF00, stroke:#FFFFFF00;
style s6 fill:#FFFFFF00, stroke:#FFFFFF00;
style s7 fill:#FFFFFF00, stroke:#FFFFFF00;
style s81 fill:#FFFFFF00, stroke:#FFFFFF00;
style s82 fill:#FFFFFF00, stroke:#FFFFFF00;
style s83 fill:#FFFFFF00, stroke:#FFFFFF00;
style s91 fill:#FFFFFF00, stroke:#FFFFFF00;
style s92 fill:#FFFFFF00, stroke:#FFFFFF00;
style s93 fill:#FFFFFF00, stroke:#FFFFFF00;
style s101 fill:#FFFFFF00, stroke:#FFFFFF00;
style s102 fill:#FFFFFF00, stroke:#FFFFFF00;
style s103 fill:#FFFFFF00, stroke:#FFFFFF00;
\[r_j = \frac{r_i}{3} + \frac{r_k}{4}\] vote from an important page is worth more.flowchart TD
Y((y)) -->|y/2| Y((y));
Y -->|y/2| A((a));
A -->|a/2| Y;
A -->|a/2| M((m));
M -->|m| A;
???tip “General equation for rank calculation” - The above example demonstrates the idea of a general equation for calculating rank $r_{j}$ for page j:
1
2
3
$$r_j = \sum\limits_{i\rightarrow j}^{n} \frac{r_i}{d_i}$$
- $d_i$ is the number of out-degree (outgoing links) of node $i$
???tip “Visualization” - Suppose page i has importance $r_{i}$ and has outgoing links to three other pages, including page j.
1

N web pages on the Internet.1/N.???example “Iteration 0” - Initial rank vector r at iteration 0: $r^{(0)} = [1/3,1/3,1/3]^{T}$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$$
\left[
\begin{array}{c}
y \\
a \\
m
\end{array}
\right]
=
\left[
\begin{array}{c}
1/3 \\
1/3 \\
1/3
\end{array}
\right]
$$
???example “Iteration 1”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$$
\left[
\begin{array}{c}
y \\
a \\
m
\end{array}
\right]
=
\left[
\begin{array}{ccc}
1/2 & 1/2 & 0 \\
1/2 & 0 & 1 \\
0 & 1/2 & 0
\end{array}
\right]
\left[
\begin{array}{c}
1/3 \\
1/3 \\
1/3
\end{array}
\right]
=
\left[
\begin{array}{c}
1/3 \\
3/6 \\
1/6
\end{array}
\right]
$$
???example “Iteration 2”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$$
\left[
\begin{array}{c}
y \\
a \\
m
\end{array}
\right]
=
\left[
\begin{array}{ccc}
1/2 & 1/2 & 0 \\
1/2 & 0 & 1 \\
0 & 1/2 & 0
\end{array}
\right]
\left[
\begin{array}{c}
1/3 \\
3/6 \\
1/6
\end{array}
\right]
=
\left[
\begin{array}{c}
5/12 \\
1/3 \\
3/12
\end{array}
\right]
$$
???example “Iteration 3”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$$
\left[
\begin{array}{c}
y \\
a \\
m
\end{array}
\right]
=
\left[
\begin{array}{ccc}
1/2 & 1/2 & 0 \\
1/2 & 0 & 1 \\
0 & 1/2 & 0
\end{array}
\right]
\left[
\begin{array}{c}
5/12 \\
1/3 \\
3/12
\end{array}
\right]
=
\left[
\begin{array}{c}
9/24 \\
11/24 \\
1/6
\end{array}
\right]
$$
…
???example “Final iteration”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$$
\left[
\begin{array}{c}
y \\
a \\
m
\end{array}
\right]
=
\left[
\begin{array}{ccc}
1/2 & 1/2 & 0 \\
1/2 & 0 & 1 \\
0 & 1/2 & 0
\end{array}
\right]
\left[
\begin{array}{c}
... \\
... \\
...
\end{array}
\right]
=
\left[
\begin{array}{c}
6/15 \\
6/15 \\
3/15
\end{array}
\right]
$$
spider trap problema and b.graph LR
A((a)) --> B((b));
B -->A;
???failure “Solution”
1
2
3
4
5
6
7
8
9
10
11
12
13
$$
\left[
\begin{array}{c}
r_a \\
r_b
\end{array}
\right]
=
\begin{array}{cccc}
1 & 0 & 1 & 0\\
0 & 1 & 0 & 1
\end{array}
$$
dead end problema and b.graph LR
A((a)) --> B((b));
???failure “Solution”
1
2
3
4
5
6
7
8
9
10
11
12
13
$$
\left[
\begin{array}{c}
r_a \\
r_b
\end{array}
\right]
=
\begin{array}{cccc}
1 & 0 & 0 & 0\\
0 & 1 & 0 & 0
\end{array}
$$
flowchart LR
subgraph ORIGINAL
direction TB
Y1((y)) --> Y1((y));
Y1 --> A1((a));
A1 --> Y1;
A1 --> M1((m));
end
subgraph RANDOM_TELEPORT
direction TB
Y2((y)) --> Y2((y));
Y2 --> A2((a));
A2 --> Y2;
A2 --> M2((m));
M2 .-> STOP[ ]
end
ORIGINAL --> RANDOM_TELEPORT
style STOP fill:#FFFFFF00, stroke:#FFFFFF00;
to
\[\left[ \begin{array}{ccc} 1/2 & 1/2 & 0 \\ 1/2 & 0 & 1 \\ \textbf{1/3} & \textbf{1/3} & \textbf{1/3} \end{array} \right]\]where $\left[ \frac{1 - \beta}{N}\right]_{N}$ is a vector with all $N$ entries have the same value $\frac{1-\beta}{N}$.
```python linenums=”1” %%file small_graph.dat y y y a a y a m m a
1
2
3
4
5
6
- Load and view the data
```python linenums="1"
graph_data = sc.textFile("small_graph.dat")
graph_data.take(5)
```python linenums=”1” links = graph_data.map(lambda line: (line.split(“ “)[0], line.split(“ “)[1])) print(links.take(10)) links = links.groupByKey() print(links.take(10)) links = links.mapValues(list) print(links.take(10))
1
2
3
4
5
6
7
8
- Setup initial weight
```python linenums="1"
N = links.count()
ranks = links.map(lambda line: (line[0], 1/N))
ranks.take(3)
```python linenums=”1” def calculateVotes(t): res = [] for item in t[1][1]: count = len(t[1][1]) res.append((item, t[1][0] / count)) return res ############################### calculateVotes((‘y’, (0.3333333333333333, [‘y’, ‘a’])))
1
2
3
4
5
6
- Vote distribution
```python linenums="1"
votes = ranks.join(links)
votes.take(10)
```python linenums=”1” votes = votes.flatMap(calculateVotes) print(votes.take(10))
1
2
3
4
5
6
- Vote tally
```python linenums="1"
ranks = votes.reduceByKey(lambda x,y: x + y)
ranks.take(10)
```python linenums=”1” %%time for i in range(10): votes = ranks.join(links).flatMap(calculateVotes) ranks = votes.reduceByKey(lambda x,y: x + y) print(ranks.take(3))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
- Iterating with a stopping condition
```python linenums="1"
%%time
sum = 1
while sum > 0.01:
old_ranks = ranks
votes = ranks.join(links).flatMap(calculateVotes)
ranks = votes.reduceByKey(lambda x,y: x + y)
errors = old_ranks.join(ranks).mapValues(lambda v: abs(v[0] - v[1]))
sum = errors.values().sum()
print(sum)
print(ranks.take(3))
authorities).hub).n pages on the Web flowchart LR
A((A)) --> B((B))
A --> C((C))
A ---> D((D))
B ---> A
B ---> D
C --> E((E))
D --> B
D --> C
hubbiness of a page is proportional to the sum of the authority of its successor. authority), then the value of that hub (hubbiness) will be high.authority of a page is proportional to the sum of the hubbiness of its predecessor. hubbiness), then that page must be of high quality (high authority).???example “Iteration 0” - Initialize $h$:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
$$
h=
\left[
\begin{array}{c}
1 \\
1 \\
1 \\
1 \\
1
\end{array}
\right]
$$
- Compute $a = L^{T}h$
$$
a =
\left[
\begin{array}{ccccc}
0 & 1 & 0 & 0 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0
\end{array}
\right]
\left[
\begin{array}{c}
1 \\
1 \\
1 \\
1 \\
1
\end{array}
\right]
=
\left[
\begin{array}{c}
1 \\
2 \\
2 \\
2 \\
1
\end{array}
\right]
$$
- Scale so that the largest component of $a$ is 1
- Since max value in $a$ is 2, we divide all values in $a$ by 2.
$$
a=
\left[
\begin{array}{c}
1/2 \\
1 \\
1 \\
1 \\
1/2
\end{array}
\right]
$$
- Compute $h = La$
$$
h =
\left[
\begin{array}{ccccc}
0 & 1 & 1 & 1 & 0 \\
1 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 \\
0 & 1 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 0
\end{array}
\right]
\left[
\begin{array}{c}
1/2 \\
1 \\
1 \\
1 \\
1/2
\end{array}
\right]
=
\left[
\begin{array}{c}
3 \\
3/2 \\
1/2 \\
2 \\
0
\end{array}
\right]
$$
- Scale so that the largest component of $h$ is 1
- Since max value in $h$ is 3, we divide all values in $h$ by 3.
$$
h=
\left[
\begin{array}{c}
1 \\
1/2 \\
1/6 \\
2/3 \\
0
\end{array}
\right]
$$
???example “Iteration 1” - This is $h$ value from previous iteration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
$$
h=
\left[
\begin{array}{c}
1 \\
1/2 \\
1/6 \\
2/3 \\
0
\end{array}
\right]
$$
- Compute $a = L^{T}h$
$$
a =
\left[
\begin{array}{ccccc}
0 & 1 & 0 & 0 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0
\end{array}
\right]
\left[
\begin{array}{c}
1 \\
1/2 \\
1/6 \\
2/3 \\
0
\end{array}
\right]
=
\left[
\begin{array}{c}
1/2 \\
5/3 \\
5/3 \\
3/2 \\
1/6
\end{array}
\right]
$$
- Scale so that the largest component of $a$ is 1
- Since max value in $a$ is $5/3$, we divide all values in $a$ by $5/3$.
$$
a=
\left[
\begin{array}{c}
3/10 \\
1 \\
1 \\
9/10 \\
1/10
\end{array}
\right]
$$
- Compute $h = La$
$$
h =
\left[
\begin{array}{ccccc}
0 & 1 & 1 & 1 & 0 \\
1 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 \\
0 & 1 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 0
\end{array}
\right]
\left[
\begin{array}{c}
3/10 \\
1 \\
1 \\
9/10 \\
1/10
\end{array}
\right]
=
\left[
\begin{array}{c}
29/10 \\
6/5 \\
1/10 \\
2 \\
0
\end{array}
\right]
$$
- Scale so that the largest component of $h$ is 1
- Since max value in $h$ is $29/10$, we divide all values in $h$ by $29/10$.
$$
a=
\left[
\begin{array}{c}
1 \\
12/29 \\
1/29 \\
20/29 \\
0
\end{array}
\right]
$$
python linenums="1" %%file small_web.dat A B A D A C B A B D C E D C D B
hubbiness in the Hollins site.