当前位置:网站首页>Machine learning 05: nonlinear support vector machines

Machine learning 05: nonlinear support vector machines

2022-06-26 05:46:00 SP FA

The main idea


Theorem :
For linearly nonseparable points , There must be a dimension that makes it linearly separable .

Based on this theorem , We can use support vector machines to solve the problem of linearly nonseparable data , That is to map the data set to a high latitude , Then linear classification .


L d


Suppose the original sample point is x ⃗ i \vec x_i xi, use ϕ ( x ⃗ i ) \phi(\vec x_i) ϕ(xi) Represents a new sample point mapped to a new feature space . Then the split hyperplane can be expressed as

f ( x ⃗ ) = w ⃗ ϕ ( x ⃗ ) + b f(\vec x)=\vec w\phi(\vec x)+b f(x)=wϕ(x)+b

So the dual problem of nonlinear support vector machine becomes :

max ⁡ α ⃗ ∑ i = 1 N α i − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ϕ ( x ⃗ i ) T ϕ ( x ⃗ j ) s . t .   ∑ i = 1 N α i y i = 0 ,   0 ≤ α i ≤ C \max\limits_{\vec\alpha}\sum\limits^N_{i=1}\alpha_i-\frac{1}{2}\sum\limits^N_{i=1}\sum\limits^N_{j=1}\alpha_i\alpha_jy_iy_j\phi(\vec x_i)^T\phi(\vec x_j)\\s.t.\ \sum\limits^N_{i=1}\alpha_iy_i=0,\ 0\le\alpha_i\le C αmaxi=1Nαi21i=1Nj=1Nαiαjyiyjϕ(xi)Tϕ(xj)s.t. i=1Nαiyi=0, 0αiC

But the dimension after mapping from low dimensional space to high dimensional space may be very high , This will greatly increase ϕ ( x ⃗ i ) T ϕ ( x ⃗ j ) \phi(\vec x_i)^T\phi(\vec x_j) ϕ(xi)Tϕ(xj) Amount of computation , So we need other ways to simplify the operation . This is the kernel function .


Kernel function


The function of kernel function

Both the objective function and the classification decision function only involve the inner product between samples , So there is no need to explicitly specify the nonlinear transformation , Instead, replace the inner product with a kernel function . set up K ( x , z ) K(x,z) K(x,z) Is a kernel function , Or positive definite nucleus , Then there is a from input space To The feature space Mapping ϕ ( x ⃗ ) \phi(\vec x) ϕ(x), For any input space x ⃗ , z ⃗ \vec x,\vec z x,z Yes

K ( x ⃗ , z ⃗ ) = ϕ ( x ⃗ ) ⋅ ϕ ( z ⃗ ) K(\vec x,\vec z)=\phi(\vec x)\cdot\phi(\vec z) K(x,z)=ϕ(x)ϕ(z)

Suppose our input space is m m m dimension , The feature space is d d d dimension , Then we can get the result of the kernel function d d d The dimension operation is reduced to m m m Dimensional operation . When d d d Very large or even infinite , Kernel functions can reduce a lot of computation .


Common kernel functions

  1. Linear kernel function : K ( x ⃗ i , x ⃗ j ) = x ⃗ i T x ⃗ j + c K(\vec x_i,\vec x_j)=\vec x_i^T\vec x_j+c K(xi,xj)=xiTxj+c

  2. Polynomial kernel function : K ( x ⃗ i , x ⃗ j ) = ( α x ⃗ i T x ⃗ j + c ) d K(\vec x_i,\vec x_j)=(\alpha\vec x_i^T\vec x_j+c)^d K(xi,xj)=(αxiTxj+c)d

  3. Radial basis function ( Gaussian kernel ): K ( x ⃗ i , x ⃗ j ) = e − γ ∥ x ⃗ i − x ⃗ j ∥ 2 K(\vec x_i,\vec x_j)=e^{-\gamma\parallel\vec x_i-\vec x_j\parallel^2} K(xi,xj)=eγxixj2

  4. s i g m o d sigmod sigmod Kernel function : K ( x ⃗ i , x ⃗ j ) = tanh ⁡ ( α x ⃗ i T x ⃗ j + c ) K(\vec x_i,\vec x_j)=\tanh(\alpha\vec x_i^T\vec x_j+c) K(xi,xj)=tanh(αxiTxj+c)





Sync update on :SP-FA The blog of


原网站

版权声明
本文为[SP FA]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206260535585535.html