当前位置:网站首页>Machine learning 3-ridge regression, Lasso, variable selection technique
Machine learning 3-ridge regression, Lasso, variable selection technique
2022-06-23 06:11:00 【Just a】
List of articles
One . Ridge return
1.1 What is ridge regression
Ridge regression is a regression method of biased estimation specially used for collinear data analysis , It is actually an improved least square method , But it gives up the unbiasedness of the least squares , Loss of some information , Give up part of the accuracy to seek a regression equation that is less effective but more in line with the reality .
Here we introduce the regression coefficient formula of downhill regression ,B(k)=(X’X+kI)-1X’Y As an estimate of the regression coefficient , This value is more stable than the least square estimate . call B(k) Ridge estimation of regression coefficient . obviously , When k=0 when , be B(k) It becomes the least square estimation ; And when k→∞ when ,B(k) It tends to 0. therefore ,k The value should not be too large , We're going to let k Lower value .

1.2 Ridge trace figure
When there is no singularity , The ridge trace should be stable and gradually tend to 0
Observe the ridge estimation through the ridge trace map , You can determine which variables should be eliminated 
1.3 Properties of ridge regression estimation




1.4 Ridge trace analysis

1.5 General selection principle of ridge parameters
choice k( or lambda) value , To cause to
(1) The ridge estimation of each regression coefficient is basically stable ;
(2) The regression coefficient with unreasonable sign when using least square estimation , The sign of its ridge estimation becomes reasonable ;
(3) There is no absolute value of the regression coefficient that is not in line with the actual meaning ;
(4) The sum of squared residuals does not increase much .
1.6 Variance expansion factor method

1.7 use R Language carries on ridge regression
Code :
library(MASS)
longley
summary(fm1 <- lm(Employed ~ ., data = longley))
names(longley)[1] <- "y"
lm.ridge(y ~ ., longley)
plot(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))
select(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))

Two . Lasso
1.1 Lasso summary
The problems of ridge regression :
- There are too many calculation methods for ridge parameters , The difference is too big
- The variables were screened according to ridge trace map , Too random
- Ridge regression regression regression model ( If there is no variable filter ) Include all variables
LASSO
Tibshirani(1996) Put forward Lasso(The Least Absolute Shrinkage and Selectionatoroperator) Algorithm
By constructing a first-order penalty function, a refined model is obtained ; By finalizing some indicators ( Variable ) The coefficient is zero ( The ridge regression estimation coefficient is equal to 0 There is little chance of death , Make it difficult to filter variables ), Strong explanatory power
Good at processing data with multicollinearity , Not the same as ridge regression is biased estimation
1.2 Why? LASSO Can filter variables directly

1.3 LASSO vs Ridge return


1.4 A more general model


1.5 Elastic net
Zouand Hastie (2005) Put forward elasticnet

Reference resources :
- http://www.dataguru.cn/article-4063-1.html
- https://zhuanlan.zhihu.com/p/426162272
边栏推荐
- Wireshark TS | 视频 APP 无法播放问题
- Tcp/ip explanation (version 2) notes / 3 link layer / 3.3 full duplex, energy saving, automatic negotiation mechanism, 802.1x flow control / 3.3.3 link layer flow control
- 如何指定pig-register项目日志的输出路径
- Centos7 installation of postgresql8.2.15 and creation of stored procedures
- Adnroid activity截屏 保存显示到相册 View显示图片 动画消失
- 十一、纺织面料下架功能的实现
- Wireshark TS | video app cannot play
- Ant Usage Summary (III): batch packaging apk
- pyinstaller 打包exe设置图标不显示
- Pyinstaller sklearn reports errors
猜你喜欢

New classes are launched | 5 minutes each time, you can easily play with Alibaba cloud container service!

Pyqt5 setting window top left Icon

Day_13 传智健康项目-第13章

ant使用总结(二):相关命令说明
![[cocos2d-x] screenshot sharing function](/img/fc/e3d7e5ba164638e2c48bc4a52a7f13.png)
[cocos2d-x] screenshot sharing function

Real MySQL interview questions (XXVII) -- Classification of users by RFM analysis method

jvm-05. garbage collection

jvm-01. Instruction rearrangement

Cloud native database is the future

HierarchyViewer工具找不到 HierarchyViewer位置
随机推荐
【Vivado那些事儿】XilinxCEDStore介绍
Centos7 deploy radius service -freeradius-3.0.13-15 EL7 integrating MySQL
如何指定pig-register项目日志的输出路径
Tcp/ip explanation (version 2) notes / 3 link layer / 3.3 full duplex, energy saving, automatic negotiation mechanism, 802.1x flow control / 3.3.3 link layer flow control
【Cocos2d-x】可擦除的Layer:ErasableLayer
Causes and methods of exe flash back
Pyinstaller package exe setting icon is not displayed
jvm-05. garbage collection
Centos7 installation of postgresql8.2.15 and creation of stored procedures
jvm-04. Object's memory layout
Kotlin Android simple activity jump, simple combination of handler and thread
【Cocos2d-x】截图分享功能
Leetcode topic analysis: factorial training zeroes
Alibaba cloud ack one and ACK cloud native AI suite have been newly released to meet the needs of the end of the computing era
jvm-04.对象的内存布局
Vite learning (I) - Introduction
Fraction to recursing decimal
工作积累-判断GPS是否打开
Pyqt5 setting window top left Icon
The difference between SaaS software and traditional software delivery mode