当前位置:网站首页>[regression analysis] understand ridge regression with case teaching

[regression analysis] understand ridge regression with case teaching

2022-06-25 12:06:00 Halosec_ Wei

1、 effect

Ridge regression is a biased estimation regression method for collinear data analysis , In essence, it is an improved least squares estimation method , By giving up the unbiasedness of least squares , To lose some information 、 It is more practical to obtain the regression coefficient at the cost of reducing the accuracy 、 More reliable regression methods , The fitting of ill-conditioned data is better than the least square method .

2、 Input / output description

Input : The independent variables X At least one or more quantitative or categorical variables , The dependent variable Y Quantitative variables are required ( If it is a variable of fixed class , Please use logistic regression ).
Output : The result of model test goodness , Linear relationship between independent variable and dependent variable, etc .

3、 Learning Websites

SPSSPRO- Free professional online data analysis platform

4、 Case example

Case study : Through independent variables ( Room area 、 Floor height 、 House unit price 、 Is there an elevator 、 Number of schools around 、 From the subway station ) Fitting the predicted dependent variable ( housing price ), Now we find that there is a strong collinearity between the unit price of the house and the floor height ,VIF The value is higher than 20; The common least square method cannot be used OLS regression analysis , Ridge regression model is required .

5、 Case data

Ridge regression case data

6、 Case operation

Step1: New analysis ;
Step2: Upload data ;
Step3: Select the corresponding data to open and preview , Click start analysis after confirmation ;

step4: choice 【 Ridge return (Ridge)】;
step5: View the corresponding data format ,【 Ridge return (Ridge)】 The argument is required X At least one or more quantitative or categorical variables , The dependent variable Y Quantitative variables are required .
step6: Click on 【 To analyze 】, Complete the operation .

7、 Output result analysis

Output results 1: Ridge trace figure


Chart description :  Through ridge trace map , determine K value .K The selection principle of value is the minimum when the standardized regression coefficient of each independent variable tends to be stable K value . But the ridge parameters determined by the ridge trace analysis method k To some extent, it is subjective and artificial ,psspro The method of variance expansion factor is used to automatically determine K=0.162.

Output results 2: Results of ridge regression analysis

*p<0.05,**p<0.01,***p<0.001
Chart description : The results of ridge regression show that : Based on field area 、 floor 、 The unit price 、 Number of schools around (1km)、 Distance from subway station (km)、 Significance of the regression model of the supporting elevator The value is 0.000***, The level is significant , Rejection of null hypothesis , It shows that there is a regression relationship between independent variables and dependent variables . meanwhile , Goodness of fit of model ² by 0.956, The model is relatively excellent , Therefore, the model basically meets the requirements .

The formula of the model :
The total price =-64.72+0.987 × area -0.043 × floor +0.008 × The unit price -0.447 × Number of schools around (1km)-4.198 × Distance from subway station (km)-3.674 × Supporting elevator r/&amp;gt;<br/> Output results 3: Model path diagram


Chart description : The above figure shows the results of this model in the form of a path diagram , It mainly includes the coefficients of the model , The formula used to analyze the model .

Output results 4: Model result diagram


Chart description : The figure above shows the original data diagram of this model in a visual form 、 Model fitting value .

8、 matters needing attention

  • Generally, before making the ridge return , First use linear regression ( Least squares regression ), If you find an argument VIF( Collinearity ) Too big , Exceed 10, Just use ridge regression ;
  • SPSSPRO The variance expansion factor method is used to automatically find K value ;
  • selection k The general principle of value is :
    • The ridge estimation of each regression coefficient is basically stable
    • The regression coefficient with unreasonable sign estimated by the least square method , The sign of its ridge estimation becomes reasonable
    • There is no absolute value of the regression coefficient that does not accord with the economic significance
    • The sum of squares of residuals does not increase much

9、 Model theory

Ridge return (Ridge Regression) It is a kind of regression method , It belongs to statistical method . stay machine learning Also known as weight attenuation . Some people call it Tikhonov Regularization . Ridge regression mainly solves two problems : One is when the number of predicted variables exceeds the number of observed variables ( Predictive variables are equivalent to characteristics , The observed variable is equivalent to the label ), Second, the data sets have multicollinearity , That is, there is correlation between the prediction variables .
General , Regression analysis ( matrix ) Form the following :

In general , The objective of using the least square method to solve the above regression problem is to minimize the following formula :

Ridge regression is to add a penalty item to the above minimization goal :

there λ It is also a parameter to be determined . in other words , Ridge regression is a least square regression with two norm penalty .
 

10、 reference

[1] Liu chao , regression analysis —— Method 、 Data and R Application , Higher Education Press ,2019

原网站

版权声明
本文为[Halosec_ Wei]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202200535108351.html