graph TD
B(("B(38.4)")):::bWide --> C(("C(34.3)")):::cWide
C --> B
D(("D(3.9)")):::dWide --> A(("A(3.3)")):::aWide
D --> B
E(("E(8.1)")):::eWide --> D
E --> B
E --> F(("F(3.9)")):::dWide
F --> B
F --> E
G(("1.6")):::wide5 --> B
H(("1.6")):::wide5 --> C
I(("1.6")):::wide5 --> B
I --> E
K(("1.6")):::wide5 --> B
classDef aWide padding:10px
classDef bWide padding:38px
classDef cWide padding:34px
classDef aWide padding:10px
classDef eWide padding:15px
classDef wide5 padding: 5px
style A fill:#FFA500
style B fill:#00FFFF
style C fill:#8B8000
style D fill:#f198b3
style E fill:#ADEBB3
style F fill:#f198b3
graph TD
k((k)) -->|$$r_k/4$$| j((j))
i((i)) -->|$$r_i/3$$| j((j))
s1((" ")) --> i
i --> s2((" "))
i --> s3((" "))
s4((" ")) --> k
k --> s5((" "))
k --> s6((" "))
k --> s7((" "))
j -->|$$r_j/3$$| s8((" "))
j -->|$$r_j/3$$| s9((" "))
j -->|$$r_j/3$$| s10((" "))
s8 --> s81((" "))
s8 --> s82((" "))
s8 --> s83((" "))
s9 --> s91((" "))
s9 --> s92((" "))
s9 --> s93((" "))
s10 --> s101((" "))
s10 --> s102((" "))
s10 --> s103((" "))
style j fill:#FF0000
style i fill:#ADEBB3
style k fill:#ADEBB3
style s1 fill:#FFFFFF00, stroke:#FFFFFF00;
style s2 fill:#FFFFFF00, stroke:#FFFFFF00;
style s3 fill:#FFFFFF00, stroke:#FFFFFF00;
style s4 fill:#FFFFFF00, stroke:#FFFFFF00;
style s5 fill:#FFFFFF00, stroke:#FFFFFF00;
style s6 fill:#FFFFFF00, stroke:#FFFFFF00;
style s7 fill:#FFFFFF00, stroke:#FFFFFF00;
style s81 fill:#FFFFFF00, stroke:#FFFFFF00;
style s82 fill:#FFFFFF00, stroke:#FFFFFF00;
style s83 fill:#FFFFFF00, stroke:#FFFFFF00;
style s91 fill:#FFFFFF00, stroke:#FFFFFF00;
style s92 fill:#FFFFFF00, stroke:#FFFFFF00;
style s93 fill:#FFFFFF00, stroke:#FFFFFF00;
style s101 fill:#FFFFFF00, stroke:#FFFFFF00;
style s102 fill:#FFFFFF00, stroke:#FFFFFF00;
style s103 fill:#FFFFFF00, stroke:#FFFFFF00;
\[r_j = \frac{r_i}{3} + \frac{r_k}{4}\] vote from an important page is worth more.flowchart TD
Y((y)) -->|y/2| Y((y));
Y -->|y/2| A((a));
A -->|a/2| Y;
A -->|a/2| M((m));
M -->|m| A;
N web pages on the Internet.1/N.…
spider trap problema and b.graph LR
A((a)) --> B((b));
B -->A;
dead end problema and b.graph LR
A((a)) --> B((b));
flowchart LR
subgraph ORIGINAL
direction TB
Y1((y)) --> Y1((y));
Y1 --> A1((a));
A1 --> Y1;
A1 --> M1((m));
end
subgraph RANDOM_TELEPORT
direction TB
Y2((y)) --> Y2((y));
Y2 --> A2((a));
A2 --> Y2;
A2 --> M2((m));
M2 .-> STOP[ ]
end
ORIGINAL --> RANDOM_TELEPORT
style STOP fill:#FFFFFF00, stroke:#FFFFFF00;
to
\[\left[ \begin{array}{ccc} 1/2 & 1/2 & 0 \\ 1/2 & 0 & 1 \\ \textbf{1/3} & \textbf{1/3} & \textbf{1/3} \end{array} \right]\]where $\left[ \frac{1 - \beta}{N}\right]_{N}$ is a vector with all $N$ entries have the same value $\frac{1-\beta}{N}$.
git pull to update the big-data-engineering repository.link-analysis/data/small_graph.dat and has the following format
1
2
3
4
5
y y
y a
a y
a m
m a
1
2
graph_data = sc.textFile("small_graph.dat")
graph_data.take(5)
1
2
3
4
5
6
links = graph_data.map(lambda line: (line.split(" ")[0], line.split(" ")[1]))
print(links.take(10))
links = links.groupByKey()
print(links.take(10))
links = links.mapValues(list)
print(links.take(10))
1
2
3
N = links.count()
ranks = links.map(lambda line: (line[0], 1/N))
ranks.take(3)
1
2
3
4
5
6
7
8
def calculateVotes(t):
res = []
for item in t[1][1]:
count = len(t[1][1])
res.append((item, t[1][0] / count))
return res
###############################
calculateVotes(('y', (0.3333333333333333, ['y', 'a'])))
1
2
votes = ranks.join(links)
votes.take(10)
1
2
votes = votes.flatMap(calculateVotes)
print(votes.take(10))
1
2
ranks = votes.reduceByKey(lambda x,y: x + y)
ranks.take(10)
1
2
3
4
5
%%time
for i in range(10):
votes = ranks.join(links).flatMap(calculateVotes)
ranks = votes.reduceByKey(lambda x,y: x + y)
print(ranks.take(3))
1
2
3
4
5
6
7
8
9
10
%%time
sum = 1
while sum > 0.01:
old_ranks = ranks
votes = ranks.join(links).flatMap(calculateVotes)
ranks = votes.reduceByKey(lambda x,y: x + y)
errors = old_ranks.join(ranks).mapValues(lambda v: abs(v[0] - v[1]))
sum = errors.values().sum()
print(sum)
print(ranks.take(3))
authorities).hub).n pages on the Web flowchart LR
A((A)) --> B((B))
A --> C((C))
A ---> D((D))
B ---> A
B ---> D
C --> E((E))
D --> B
D --> C
hubbiness of a page is proportional to the sum of the authority of its successor. authority), then the value of that hub (hubbiness) will be high.authority of a page is proportional to the sum of the hubbiness of its predecessor. hubbiness), then that page must be of high quality (high authority).
1
2
3
4
5
6
7
8
9
%%file small_web.dat
A B
A D
A C
B A
B D
C E
D C
D B
hubbiness in the Hollins site.