当前位置:网站首页>Machine learning 05: nonlinear support vector machines
Machine learning 05: nonlinear support vector machines
2022-06-26 05:46:00 【SP FA】
List of articles
The main idea
Theorem :
For linearly nonseparable points , There must be a dimension that makes it linearly separable .
Based on this theorem , We can use support vector machines to solve the problem of linearly nonseparable data , That is to map the data set to a high latitude , Then linear classification .
L d
Suppose the original sample point is x ⃗ i \vec x_i xi, use ϕ ( x ⃗ i ) \phi(\vec x_i) ϕ(xi) Represents a new sample point mapped to a new feature space . Then the split hyperplane can be expressed as
f ( x ⃗ ) = w ⃗ ϕ ( x ⃗ ) + b f(\vec x)=\vec w\phi(\vec x)+b f(x)=wϕ(x)+b
So the dual problem of nonlinear support vector machine becomes :
max α ⃗ ∑ i = 1 N α i − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ϕ ( x ⃗ i ) T ϕ ( x ⃗ j ) s . t . ∑ i = 1 N α i y i = 0 , 0 ≤ α i ≤ C \max\limits_{\vec\alpha}\sum\limits^N_{i=1}\alpha_i-\frac{1}{2}\sum\limits^N_{i=1}\sum\limits^N_{j=1}\alpha_i\alpha_jy_iy_j\phi(\vec x_i)^T\phi(\vec x_j)\\s.t.\ \sum\limits^N_{i=1}\alpha_iy_i=0,\ 0\le\alpha_i\le C αmaxi=1∑Nαi−21i=1∑Nj=1∑Nαiαjyiyjϕ(xi)Tϕ(xj)s.t. i=1∑Nαiyi=0, 0≤αi≤C
But the dimension after mapping from low dimensional space to high dimensional space may be very high , This will greatly increase ϕ ( x ⃗ i ) T ϕ ( x ⃗ j ) \phi(\vec x_i)^T\phi(\vec x_j) ϕ(xi)Tϕ(xj) Amount of computation , So we need other ways to simplify the operation . This is the kernel function .
Kernel function
The function of kernel function
Both the objective function and the classification decision function only involve the inner product between samples , So there is no need to explicitly specify the nonlinear transformation , Instead, replace the inner product with a kernel function . set up K ( x , z ) K(x,z) K(x,z) Is a kernel function , Or positive definite nucleus , Then there is a from input space To The feature space Mapping ϕ ( x ⃗ ) \phi(\vec x) ϕ(x), For any input space x ⃗ , z ⃗ \vec x,\vec z x,z Yes
K ( x ⃗ , z ⃗ ) = ϕ ( x ⃗ ) ⋅ ϕ ( z ⃗ ) K(\vec x,\vec z)=\phi(\vec x)\cdot\phi(\vec z) K(x,z)=ϕ(x)⋅ϕ(z)
Suppose our input space is m m m dimension , The feature space is d d d dimension , Then we can get the result of the kernel function d d d The dimension operation is reduced to m m m Dimensional operation . When d d d Very large or even infinite , Kernel functions can reduce a lot of computation .
Common kernel functions
Linear kernel function : K ( x ⃗ i , x ⃗ j ) = x ⃗ i T x ⃗ j + c K(\vec x_i,\vec x_j)=\vec x_i^T\vec x_j+c K(xi,xj)=xiTxj+c
Polynomial kernel function : K ( x ⃗ i , x ⃗ j ) = ( α x ⃗ i T x ⃗ j + c ) d K(\vec x_i,\vec x_j)=(\alpha\vec x_i^T\vec x_j+c)^d K(xi,xj)=(αxiTxj+c)d
Radial basis function ( Gaussian kernel ): K ( x ⃗ i , x ⃗ j ) = e − γ ∥ x ⃗ i − x ⃗ j ∥ 2 K(\vec x_i,\vec x_j)=e^{-\gamma\parallel\vec x_i-\vec x_j\parallel^2} K(xi,xj)=e−γ∥xi−xj∥2
s i g m o d sigmod sigmod Kernel function : K ( x ⃗ i , x ⃗ j ) = tanh ( α x ⃗ i T x ⃗ j + c ) K(\vec x_i,\vec x_j)=\tanh(\alpha\vec x_i^T\vec x_j+c) K(xi,xj)=tanh(αxiTxj+c)
Sync update on :SP-FA The blog of
边栏推荐
猜你喜欢
随机推荐
Thinking about bad money expelling good money
Gram 矩阵
[red team] what preparations should be made to join the red team?
Bingc (inheritance)
421-二叉树(226. 翻转二叉树、101. 对称二叉树、104.二叉树的最大深度、222.完全二叉树的节点个数)
Positioning setting horizontal and vertical center (multiple methods)
家庭记账程序(第一版)
Consul service registration and discovery
状态模式,身随心变
Recursively traverse directory structure and tree presentation
The use of loops in SQL syntax
Data storage: the difference between MySQL InnoDB and MyISAM
【 langage c】 stockage des données d'analyse approfondie en mémoire
Navicat如何将当前连接信息复用另一台电脑
电商借助小程序技术发力寻找增长突破口
数据存储:MySQL之InnoDB与MyISAM的区别
SDN based DDoS attack mitigation
cross entropy loss = log softmax + nll loss
Talk 5 wireless communication
Daily production training report (16)