Distributed Machine Learning with Spark

Distributed Machine Learning with Spark

1. Application

Example: Spam filtering

  viagra learning the dating nigeria spam?
X1 1 0 1 0 0 $y_1$ = 1
X2 0 1 1 0 0 $y_2$ = -1
X3 0 0 0 0 1 $y_3$ = 1

2. Linear models for classification

2.1. Overview

\[f(x) = \begin{cases} +1 & \text{if } w_{1}x_{1}+w_{2}x_{2}+...+w_{d}x_{d} \ge \theta \\ -1 & \text{otherwise} \end{cases}\] \[w_{1}x_{1}+w_{2}x_{2}+...+w_{d}x_{d} = \theta \\\]

2.2. Linear classifiers

3. Support Vector Machine

3.1. Overview

3.2. Support Vector Machine: largest margin

3.3. Support Vector Machine: what is the margin?

3.4. Some more math …

3.5. SVM: Non-linearly separable data