当前位置:网站首页>Machine learning 3-ridge regression, Lasso, variable selection technique
Machine learning 3-ridge regression, Lasso, variable selection technique
2022-06-23 06:11:00 【Just a】
List of articles
One . Ridge return
1.1 What is ridge regression
Ridge regression is a regression method of biased estimation specially used for collinear data analysis , It is actually an improved least square method , But it gives up the unbiasedness of the least squares , Loss of some information , Give up part of the accuracy to seek a regression equation that is less effective but more in line with the reality .
Here we introduce the regression coefficient formula of downhill regression ,B(k)=(X’X+kI)-1X’Y As an estimate of the regression coefficient , This value is more stable than the least square estimate . call B(k) Ridge estimation of regression coefficient . obviously , When k=0 when , be B(k) It becomes the least square estimation ; And when k→∞ when ,B(k) It tends to 0. therefore ,k The value should not be too large , We're going to let k Lower value .

1.2 Ridge trace figure
When there is no singularity , The ridge trace should be stable and gradually tend to 0
Observe the ridge estimation through the ridge trace map , You can determine which variables should be eliminated 
1.3 Properties of ridge regression estimation




1.4 Ridge trace analysis

1.5 General selection principle of ridge parameters
choice k( or lambda) value , To cause to
(1) The ridge estimation of each regression coefficient is basically stable ;
(2) The regression coefficient with unreasonable sign when using least square estimation , The sign of its ridge estimation becomes reasonable ;
(3) There is no absolute value of the regression coefficient that is not in line with the actual meaning ;
(4) The sum of squared residuals does not increase much .
1.6 Variance expansion factor method

1.7 use R Language carries on ridge regression
Code :
library(MASS)
longley
summary(fm1 <- lm(Employed ~ ., data = longley))
names(longley)[1] <- "y"
lm.ridge(y ~ ., longley)
plot(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))
select(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))

Two . Lasso
1.1 Lasso summary
The problems of ridge regression :
- There are too many calculation methods for ridge parameters , The difference is too big
- The variables were screened according to ridge trace map , Too random
- Ridge regression regression regression model ( If there is no variable filter ) Include all variables
LASSO
Tibshirani(1996) Put forward Lasso(The Least Absolute Shrinkage and Selectionatoroperator) Algorithm
By constructing a first-order penalty function, a refined model is obtained ; By finalizing some indicators ( Variable ) The coefficient is zero ( The ridge regression estimation coefficient is equal to 0 There is little chance of death , Make it difficult to filter variables ), Strong explanatory power
Good at processing data with multicollinearity , Not the same as ridge regression is biased estimation
1.2 Why? LASSO Can filter variables directly

1.3 LASSO vs Ridge return


1.4 A more general model


1.5 Elastic net
Zouand Hastie (2005) Put forward elasticnet

Reference resources :
- http://www.dataguru.cn/article-4063-1.html
- https://zhuanlan.zhihu.com/p/426162272
边栏推荐
- jvm-05.垃圾回收
- Redis 哨兵
- The difference between SaaS software and traditional software delivery mode
- APP SHA1获取程序 百度地图 高德地图获取SHA1值的简单程序
- Pat class B 1013 C language
- Prometheus, incluxdb2.2 installation and flume_ Export download compile use
- 给定二叉树的某个节点,返回该节点的后继节点
- The construction of digital factory can be divided into three aspects
- Cryptography series: certificate format representation of PKI X.509
- True MySQL interview question (XXII) -- condition screening and grouping screening after table connection
猜你喜欢

【Cocos2d-x】自定义环形菜单

Centos7 deploy radius service -freeradius-3.0.13-15 EL7 integrating MySQL

机器学习3-岭回归,Lasso,变量选择技术

Analysis on the problems and causes of digital transformation of manufacturing industry

True MySQL interview question (21) - Finance - overdue loan

Wireshark TS | video app cannot play

jvm-01.指令重排

(1)基础学习——vim编辑器常用快捷操作命令

The hierarchyviewer tool cannot find the hierarchyviewer location

Data migration from dolphin scheduler 1.2.1 to dolphin scheduler 2.0.5 and data test records after migration
随机推荐
论文笔记: 多标签学习 LSML
jvm-03.jvm内存模型
Three most advanced certifications, two innovative technologies and two outstanding cases, Alibaba cloud appeared at the cloud native industry conference
【Vivado那些事儿】XilinxCEDStore介绍
JS面试题----防抖函数
Radar canvas
Pat class B 1024 scientific notation C language
jvm-02.有序性保证
mysql以逗号分隔的字段作为查询条件怎么查——find_in_set()函数
Activity startup mode and life cycle measurement results
Pyinstaller 打包pyttsx3 出错
给定二叉树的某个节点,返回该节点的后继节点
Wechat tried out the 1065 working system, and was forced to leave work at 18:00; It is said that Apple will no longer develop off screen fingerprint identification; Amd chief independent GPU architect
机器学习3-岭回归,Lasso,变量选择技术
Perfect squares for leetcode topic analysis
Prometheus, incluxdb2.2 installation and flume_ Export download compile use
Real MySQL interview questions (XXVI) -- didi 2020 written examination questions
jvm-05.垃圾回收
Ant Usage Summary (III): batch packaging apk
Pyinstaller sklearn reports errors