当前位置:网站首页>Mpai data science platform SVM support vector machine classification \ explanation of regression parameter adjustment
Mpai data science platform SVM support vector machine classification \ explanation of regression parameter adjustment
2022-06-25 12:05:00 【Halosec_ Wei】
C: Penalty factor , The penalty coefficient used to control the loss function , Be similar to LR The regularization coefficient in .C The bigger it is , It's equivalent to punishing relaxation variables , I want the relaxation variable to be close to 0, That is, the punishment for misclassification is increased , It tends to be the case that the training set is totally divided into pairs , In this way, the accuracy of training set test is very high , But the generalization ability is weak , It is easy to cause over fitting . C Small value , The penalty for misclassification is reduced , Enhanced fault tolerance , The generalization ability is strong , But it may also be under fitted .
Value :【0,1】
Kernel function : The kernel function type used in the algorithm , Kernel function is a method used to transform nonlinear problems into linear problems .
RBF nucleus : Gaussian kernel function is to find some points in the attribute space , These points may or may not be sample points , Think of these points as base, With these base Extend the center of the circle outward , The expansion radius is the bandwidth , Data can be divided . let me put it another way , Find some supercircles in the attribute space , Use these hypercircles to determine the positive and negative classes .
Linear kernel and polynomial kernel : The function of these two kernels is to find some points in the attribute space first , Think of these points as base, The function of kernel function is to find the sample point that satisfies some relationship with the distance and angle of the point . When the angle between the sample point and the point is almost vertical , The Euclidean length of two samples must be very long to ensure that the linear kernel function is greater than 0; And when the sample point and base When the directions of the points are the same , The length doesn't have to be very long ; And when the direction is opposite , The value of the kernel function is negative , Be judged as anti class . namely , It divides a shuttle shape in space , Divide the positive and negative classes according to the shuttle shape .
Sigmoid nucleus : Again, define some base, Kernel function is to pass a linear kernel function through a tanh Function to process , Limit the range of values to -1 To 1 On .
All in all , They are all defining the distance , Greater than this distance , Positive , Less than this distance , Negative judgment . As for which kernel function to choose , It should be determined according to the specific sample distribution .
Value :RBF, Linear, Poly, Sigmoid
Coefficient of kernel function : Parameter is rbf,poly and sigmoid Kernel coefficient of ; The default is 'auto', Then the reciprocal of the characteristic digits will be used , namely 1 / n_features.( That is, the bandwidth of the kernel function , The radius of a hypercircle ).gamma The bigger it is ,σ The smaller it is , Make the Gaussian distribution tall and thin , So the model can only act near the support vector , May lead to over fitting ; conversely ,gamma The smaller it is ,σ The bigger it is , Gaussian distribution will be too smooth , The classification effect on the training set is not good , May cause under fitting ,
'auto' 1 / n_features.
scale,1 / (n_features * X.var())
Value :'auto'、scale、(0,1]
Shrinking : Heuristic or not . If you can predict which variables correspond to support vectors , Then it is enough to train on these samples , Other samples may be disregarded , This does not affect the training results , But it reduces the scale of the problem and helps to solve it quickly . further , If you can predict which variables are on the boundary ( namely a=C), Then these variables can remain fixed , Optimize only other variables , So that the scale of the problem is smaller , Training time is greatly reduced . This is it. Shrinking technology . Shrinking Technology is based on the fact that : The support vector only accounts for a small part of the training sample , And the Lagrange multipliers of most support vectors are equal to C.
Value : yes 、 no
Residual convergence condition : The default is 0.0001, Namely tolerance 1000 There is an error in the classification , And LR In the same ; When the error term reaches the specified value, the training will be stopped .
Value :【0,+ infinite 】
Maximum number of iterations : There are no restrictions by default . This is a hard limit , It takes precedence over Residual convergence condition Parameters , Whether the training standard and accuracy meet the requirements or not , Stop training .
Value :【1,+ infinite 】
Multi classification fusion strategy
SVM This is a binary classification algorithm , Because of its strong performance of neural network , Therefore, it is also widely used in the field of multi classification , this ovo and ovr They are two different strategies that need to be selected for multi classification .
ovo:one versus one, one-on-one . It's a one-to-one classifier , Right now K There are three categories that need to be built K * (K - 1) / 2 A classifier
ovr:one versus rest, A couple of other , Right now K All you need to do is build K A classifier
Value :ovo,ovr
边栏推荐
- 黑马畅购商城---8.微服务网关Gateway和Jwt令牌
- redis的dict的扩容机制(rehash)
- Quarkus SaaS dynamic data source switching implementation, simple and perfect
- ArcGIS services query filter by time field
- TCP如何处理三次握手和四次挥手期间的异常
- 9 cases where elements cannot be located
- 2022年首期Techo Day腾讯技术开放日将于6月28日线上举办
- Comment TCP gère - t - il les exceptions lors de trois poignées de main et de quatre vagues?
- Use of JSP sessionscope domain
- 为什么要分布式 id ?分布式 id 生成方案有哪些?
猜你喜欢
Evaluating the overall situation of each class in a university based on entropy weight method (formula explanation + simple tool introduction)
PD1.4转HDMI2.0转接线拆解。
ROS 笔记(06)— 话题消息的定义和使用
使用php脚本查看已开启的扩展
ROS notes (06) - definition and use of topic messages
Why distributed IDS? What are the distributed ID generation schemes?
为什么ping不通网站 但是却可以访问该网站?
Capacity expansion mechanism of Dict Of redis (rehash)
redis的dict的扩容机制(rehash)
架构师为你揭秘在阿里、腾讯、美团工作的区别
随机推荐
VFP serial port communication is difficult for 9527. Maomao just showed his skill and was defeated by kiss
Pd1.4 to hdmi2.0 adapter cable disassembly.
MYSQL中对复杂JSON的更新
Problems encountered using easyexcel
Mui scroll bar recovery
什么是Flink?Flink能用来做什么?
cnds
Architects reveal the difference between working in Alibaba, Tencent and meituan
ROS notes (06) - definition and use of topic messages
VFP calls the command line image processing program, and adding watermark is also available
客户经理的开户二维码开户买股票安全吗?有谁知道啊
做自媒体视频需要怎么做才能年收入一百万?
一套自动化无纸办公系统(OA+审批流)源码:带数据字典
Flink batch key points (personal translation)
What are redis avalanche, penetration and breakdown?
Idea local launch Flink task
网络 | traceroute,路由跟踪命令,用于确定 IP 数据包访问目标地址所经过的路径。
Comment TCP gère - t - il les exceptions lors de trois poignées de main et de quatre vagues?
R语言caTools包进行数据划分、scale函数进行数据缩放、e1071包的naiveBayes函数构建朴素贝叶斯模型
Manually rollback abnormal data