当前位置：网站首页>Machine learning 05: nonlinear support vector machines

Machine learning 05: nonlinear support vector machines

2022-06-26 05:46:00 【SP FA】

List of articles

The main idea
L d
Kernel function
- The function of kernel function
- Common kernel functions

The main idea

Theorem ：
For linearly nonseparable points , There must be a dimension that makes it linearly separable .

Based on this theorem , We can use support vector machines to solve the problem of linearly nonseparable data , That is to map the data set to a high latitude , Then linear classification .

L d

Suppose the original sample point is $\vec x_i$ , use $\phi(\vec x_i)$ Represents a new sample point mapped to a new feature space . Then the split hyperplane can be expressed as

$f(\vec x)=\vec w\phi(\vec x)+b$

So the dual problem of nonlinear support vector machine becomes ：

$\max\limits_{\vec\alpha}\sum\limits^N_{i=1}\alpha_i-\frac{1}{2}\sum\limits^N_{i=1}\sum\limits^N_{j=1}\alpha_i\alpha_jy_iy_j\phi(\vec x_i)^T\phi(\vec x_j)\\s.t.\ \sum\limits^N_{i=1}\alpha_iy_i=0,\ 0\le\alpha_i\le C$

But the dimension after mapping from low dimensional space to high dimensional space may be very high , This will greatly increase $\phi(\vec x_i)^T\phi(\vec x_j)$ Amount of computation , So we need other ways to simplify the operation . This is the kernel function .

Kernel function

The function of kernel function

Both the objective function and the classification decision function only involve the inner product between samples , So there is no need to explicitly specify the nonlinear transformation , Instead, replace the inner product with a kernel function . set up $K (x, z)$ Is a kernel function , Or positive definite nucleus , Then there is a from input space To The feature space Mapping $\phi(\vec x)$ , For any input space $\vec x,\vec z$ Yes

$K(\vec x,\vec z)=\phi(\vec x)\cdot\phi(\vec z)$

Suppose our input space is $m$ dimension , The feature space is $d$ dimension , Then we can get the result of the kernel function $d$ The dimension operation is reduced to $m$ Dimensional operation . When $d$ Very large or even infinite , Kernel functions can reduce a lot of computation .

Common kernel functions

Linear kernel function ： $K(\vec x_i,\vec x_j)=\vec x_i^T\vec x_j+c$
Polynomial kernel function ： $K(\vec x_i,\vec x_j)=(\alpha\vec x_i^T\vec x_j+c)^d$
Radial basis function （ Gaussian kernel ）： $K(\vec x_i,\vec x_j)=e^{-\gamma\parallel\vec x_i-\vec x_j\parallel^2}$
$s i g m o d$ Kernel function ： $K(\vec x_i,\vec x_j)=\tanh(\alpha\vec x_i^T\vec x_j+c)$