当前位置:网站首页>机器学习3-岭回归,Lasso,变量选择技术
机器学习3-岭回归,Lasso,变量选择技术
2022-06-23 04:31:00 【只是甲】
文章目录
一. 岭回归
1.1 什么是岭回归
岭回归是专门用于共线性数据分析的有偏估计的回归方法,实际上是一种改良的最小二乘法,但它放弃了最小二乘的无偏性,损失部分信息,放弃部分精确度为代价来寻求效果稍差但更符合实际的回归方程。
此处介绍下岭回归的回归系数公式,B(k)=(X’X+kI)-1X’Y作为回归系数的估计值,此值比最小二乘估计稳定。称B(k)为回归系数的岭估计。显然,当k=0时,则B(k)就成为了最小二乘估计;而当k→∞时,B(k)就趋于0。因此,k值不宜太大,我们要让k值小些。

1.2 岭迹图
当不存在奇异性时,岭迹应是稳定地逐渐趋向于0
通过岭迹图观察岭估计的情况,可以判断出应该剔除哪些变量
1.3 岭回归估计的性质




1.4 岭迹分析

1.5 岭参数的一般选择原则
选择k(或lambda)值,使到
(1)各回归系数的岭估计基本稳定;
(2)用最小二乘估计时符号不合理的回归系数,其岭估计的符号变得合理;
(3)回归系数没有不合乎实际意义的绝对值;
(4)残差平方和增大不太多。
1.6 方差扩大因子法

1.7 用R语言进行岭回归
代码:
library(MASS)
longley
summary(fm1 <- lm(Employed ~ ., data = longley))
names(longley)[1] <- "y"
lm.ridge(y ~ ., longley)
plot(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))
select(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))

二. Lasso
1.1 Lasso概述
岭回归存在的问题:
- 岭参数计算方法太多,差异太大
- 根据岭迹图进行变量筛选,随意性太大
- 岭回归返回癿模型(如果没有经过变量筛选)包含所有癿变量
LASSO
Tibshirani(1996)提出了Lasso(The Least Absolute Shrinkage and Selectionatoroperator)算法
通过构造一个一阶惩罚函数获得一个精炼癿模型;通过最终确定一些指标(变量)癿系数为零(岭回归估计系数等于0癿机会微乎其微,造成筛选变量困难),解释力很强
擅长处理具有多重共线性癿数据,不岭回归一样是有偏估计
1.2 为什么LASSO能直接筛选变量

1.3 LASSO vs岭回归


1.4 更一般化的模型


1.5 弹性网
Zouand Hastie (2005)提出elasticnet

参考:
- http://www.dataguru.cn/article-4063-1.html
- https://zhuanlan.zhihu.com/p/426162272
边栏推荐
- Centos7部署radius服务-freeradius-3.0.13-15.el7集成mysql
- Runc symbolic link mount and container escape vulnerability alert (cve-2021-30465)
- Ansible uses ordinary users to manage the controlled end
- Infotnews | which Postcard will you receive from the universe?
- [OWT] OWT client native P2P E2E test vs2017 build 6: modify script automatic generation vs Project
- Tcp/ip explanation (version 2) notes / 3 link layer / 3.4 bridge and switch
- TCP/IP 详解(第 2 版) 笔记 / 3 链路层 / 3.3 全双工, 节能, 自动协商机制, 802.1X 流控制 / 3.3.3 链路层流量控制
- Eight data analysis models: ogsm model
- 工作积累-判断GPS是否打开
- PAT 乙等 1011 C语言
猜你喜欢

Android handler memory leak kotlin memory leak handling

Prometheus, incluxdb2.2 installation and flume_ Export download compile use

The construction of digital factory can be divided into three aspects

【Leetcode】431. Encode N-ary Tree to Binary Tree(困难)

Kotlin android简单Activity跳转、handler和thread简单配合使用
![[open source project] excel export Lua configuration table tool](/img/3a/8e831c4216494d5497928bae21523b.png)
[open source project] excel export Lua configuration table tool

gplearn出现 assignment destination is read-only

Explicability of counter attack based on optimal transmission theory

Real MySQL interview questions (25) -- common group comparison scenarios

如何指定pig-register项目日志的输出路径
随机推荐
Three most advanced certifications, two innovative technologies and two outstanding cases, Alibaba cloud appeared at the cloud native industry conference
PAT 乙等 1021 个位数统计
jvm-05.垃圾回收
jvm-03.jvm内存模型
ant使用总结(一):使用ant自动打包apk
PAT 乙等 1023 组个最小数
【Vivado那些事儿】XilinxCEDStore介绍
Deploy docker and install MySQL in centos7
[database backup] complete the backup of MySQL database through scheduled tasks
New classes are launched | 5 minutes each time, you can easily play with Alibaba cloud container service!
PAT 乙等 1026 程序运行时间
Runc symbolic link mount and container escape vulnerability alert (cve-2021-30465)
PAT 乙等 1013 C语言
Skill self check | do you know these 6 skills if you want to be a test leader?
What benefits have digital collections enabled the real industry to release?
Real MySQL interview questions (25) -- common group comparison scenarios
Wechat tried out the 1065 working system, and was forced to leave work at 18:00; It is said that Apple will no longer develop off screen fingerprint identification; Amd chief independent GPU architect
matplotlib savefig多个图片叠加问题
How to specify the output path of pig register Project Log
Operating mongodb in node