当前位置:网站首页>[regression analysis] understand ridge regression with case teaching
[regression analysis] understand ridge regression with case teaching
2022-06-25 12:06:00 【Halosec_ Wei】
1、 effect
Ridge regression is a biased estimation regression method for collinear data analysis , In essence, it is an improved least squares estimation method , By giving up the unbiasedness of least squares , To lose some information 、 It is more practical to obtain the regression coefficient at the cost of reducing the accuracy 、 More reliable regression methods , The fitting of ill-conditioned data is better than the least square method .
2、 Input / output description
Input : The independent variables X At least one or more quantitative or categorical variables , The dependent variable Y Quantitative variables are required ( If it is a variable of fixed class , Please use logistic regression ).
Output : The result of model test goodness , Linear relationship between independent variable and dependent variable, etc .
3、 Learning Websites
SPSSPRO- Free professional online data analysis platform
4、 Case example
Case study : Through independent variables ( Room area 、 Floor height 、 House unit price 、 Is there an elevator 、 Number of schools around 、 From the subway station ) Fitting the predicted dependent variable ( housing price ), Now we find that there is a strong collinearity between the unit price of the house and the floor height ,VIF The value is higher than 20; The common least square method cannot be used OLS regression analysis , Ridge regression model is required .
5、 Case data
Ridge regression case data
6、 Case operation
Step1: New analysis ;
Step2: Upload data ;
Step3: Select the corresponding data to open and preview , Click start analysis after confirmation ;
step4: choice 【 Ridge return (Ridge)】;
step5: View the corresponding data format ,【 Ridge return (Ridge)】 The argument is required X At least one or more quantitative or categorical variables , The dependent variable Y Quantitative variables are required .
step6: Click on 【 To analyze 】, Complete the operation .
7、 Output result analysis
Output results 1: Ridge trace figure
Chart description : Through ridge trace map , determine K value .K The selection principle of value is the minimum when the standardized regression coefficient of each independent variable tends to be stable K value . But the ridge parameters determined by the ridge trace analysis method k To some extent, it is subjective and artificial ,psspro The method of variance expansion factor is used to automatically determine K=0.162.
Output results 2: Results of ridge regression analysis
*p<0.05,**p<0.01,***p<0.001
Chart description : The results of ridge regression show that : Based on field area 、 floor 、 The unit price 、 Number of schools around (1km)、 Distance from subway station (km)、 Significance of the regression model of the supporting elevator The value is 0.000***, The level is significant , Rejection of null hypothesis , It shows that there is a regression relationship between independent variables and dependent variables . meanwhile , Goodness of fit of model ² by 0.956, The model is relatively excellent , Therefore, the model basically meets the requirements .
The formula of the model :
The total price =-64.72+0.987 × area -0.043 × floor +0.008 × The unit price -0.447 × Number of schools around (1km)-4.198 × Distance from subway station (km)-3.674 × Supporting elevator r/&gt;<br/> Output results 3: Model path diagram
Chart description : The above figure shows the results of this model in the form of a path diagram , It mainly includes the coefficients of the model , The formula used to analyze the model .
Output results 4: Model result diagram
Chart description : The figure above shows the original data diagram of this model in a visual form 、 Model fitting value .
8、 matters needing attention
- Generally, before making the ridge return , First use linear regression ( Least squares regression ), If you find an argument VIF( Collinearity ) Too big , Exceed 10, Just use ridge regression ;
- SPSSPRO The variance expansion factor method is used to automatically find K value ;
- selection k The general principle of value is :
- The ridge estimation of each regression coefficient is basically stable
- The regression coefficient with unreasonable sign estimated by the least square method , The sign of its ridge estimation becomes reasonable
- There is no absolute value of the regression coefficient that does not accord with the economic significance
- The sum of squares of residuals does not increase much
9、 Model theory
Ridge return (Ridge Regression) It is a kind of regression method , It belongs to statistical method . stay machine learning Also known as weight attenuation . Some people call it Tikhonov Regularization . Ridge regression mainly solves two problems : One is when the number of predicted variables exceeds the number of observed variables ( Predictive variables are equivalent to characteristics , The observed variable is equivalent to the label ), Second, the data sets have multicollinearity , That is, there is correlation between the prediction variables .
General , Regression analysis ( matrix ) Form the following :
In general , The objective of using the least square method to solve the above regression problem is to minimize the following formula :
Ridge regression is to add a penalty item to the above minimization goal :
there λ It is also a parameter to be determined . in other words , Ridge regression is a least square regression with two norm penalty .
10、 reference
[1] Liu chao , regression analysis —— Method 、 Data and R Application , Higher Education Press ,2019
边栏推荐
- Which securities company's account is better and safer to open
- ROS notes (06) - definition and use of topic messages
- Explanation of ideas and sharing of pre-processing procedures for 2021 US game D (with pre-processing data code)
- Translation of meisai C topic in 2022 + sharing of ideas
- What are redis avalanche, penetration and breakdown?
- Record the process of submitting code to openharmony once
- 图片打标签之获取图片在ImageView中的坐标
- Comment TCP gère - t - il les exceptions lors de trois poignées de main et de quatre vagues?
- Dark horse shopping mall ---6 Brand, specification statistics, condition filtering, paging sorting, highlighting
- 客户经理的开户二维码开户买股票安全吗?有谁知道啊
猜你喜欢
Dark horse shopping mall ---8 Microservice gateway and JWT token
Deeply understand Flink SQL execution process based on flink1.12
黑马畅购商城---2.分布式文件存储FastDFS
How far is it from the DBF of VFP to the web salary query system?
Sentinel integrated Nacos data source
Black Horse Chang Shopping Mall - - - 3. Gestion des produits de base
15、wpf之button样式小记
Flink deeply understands the graph generation process (source code interpretation)
云原生数据湖以存储、计算、数据管理等能力通过信通院评测认证
一套自动化无纸办公系统(OA+审批流)源码:带数据字典
随机推荐
19、wpf之事件转命令实现MVVM架构
现在网上炒股开户身份证信息安全吗?
Dark horse shopping mall ---3 Commodity management
Simple use of stream (II)
Convergence by probability
The cloud native data lake has passed the evaluation and certification of the ICT Institute with its storage, computing, data management and other capabilities
Specific meanings of node and edge in Flink graph
R语言dist函数计算dataframe数据中两两样本之间的距离返回样本间距离矩阵,通过method参数指定距离计算的方法、例如欧几里得距离
quarkus saas动态数据源切换实现,简单完美
黑马畅购商城---1.项目介绍-环境搭建
Where do the guests come from
R语言dplyr包summarise_at函数计算dataframe数据中多个数据列(通过向量指定)的计数个数、均值和中位数、在每个函数内部指定na.rm参数、通过list指定函数列表
Evaluating the overall situation of each class in a university based on entropy weight method (formula explanation + simple tool introduction)
依概率收敛
网络 | traceroute,路由跟踪命令,用于确定 IP 数据包访问目标地址所经过的路径。
Oracle Spatial creating spatial tables
Which securities company's account is better and safer to open
Two ways of redis persistence -- detailed explanation of RDB and AOF
Flink partition policy
黑马畅购商城---6.品牌、规格统计、条件筛选、分页排序、高亮显示