当前位置:网站首页>Regression analysis - basic content

Regression analysis - basic content

2022-06-21 10:33:00 I'm afraid I'm not retarded

regression analysis

Between variables X,Y There is a close connection between , But it is not a strict functional relationship ( Non deterministic relationship )

Return to : Regression is a statistical method and technique to deal with the quantitative relationship between two or more variables , The relationship between variables is not a definite functional relationship , Describe by a certain probability distribution

Classification of regression

Linear and nonlinear

The strict definition of linearity is a mapping relation , The mapping relationship satisfies additivity and secondarily . Popular understanding is two variables ( Dependent and independent variables ) There is a power function relationship between , It appears as a straight line in the plane coordinate system .

Non linearity is non-linearity .

Linear regression

Linear regression : In regression analysis , If there is a linear relationship between the independent variable and the dependent variable , Is called linear regression .

If there is only one dependent variable and one independent variable , It is called univariate linear regression , If a dependent variable has more than one independent variable , Is called multiple regression

The regression model

General form of regression model :y = f(x1,x2,x3,...,xp) + E

f(x1,x2,x3,...,xp) Determinism

E Random error ( Disturbing item ):1. Lack of influencing factors ,2. observation / Measurement error ,3. Other random errors

The process of establishing regression model

  1. Demand analysis identifies variables

    Understand the actual needs , Define the scene , Clear the indicators to be explained ( The dependent variable ), According to the relevant business knowledge, the relevant variables are selected as explanatory variables ( The independent variables ).

  2. Data processing

    According to the explanatory variables obtained from the previous analysis , To collect relevant data ( Timing data 、 Section data, etc ), Clean the obtained data 、 machining , And adjust the explanatory variables according to the data , And judge whether the basic assumptions are met

    Check whether the data conditions meet the basic assumptions and explanatory variables :

    • Explanatory variables are non random variables , The observed value is constant
    • There is no exact linear relationship between explanatory variables
    • The number of samples is more than the number of explanatory variables
    • Random error : Zero mean 、 Homovariance 、 Unrelated 、 normal
  3. Deterministic regression model

    Learn about datasets , Use the drawing tool to draw the scatter diagram of variable samples or use other analysis tools to analyze the relationship between variables , Select the regression model according to the results , Such as : linear regression model , Exponential regression model, etc .

  4. Estimation of model parameters

    After the model is determined , Based on collection 、 Collated sample data , Estimate the relevant parameters in the model . The most common method is the least square method , In the case of not meeting the basic assumptions, ridge regression will be adopted 、 Principal component regression 、 Partial least squares, etc .

    • Least square method : Also called least square method , A method of finding the best function matching of data by minimizing the sum of squares of errors .
  5. Model checking and optimization

    After the parameters are determined , Get the model . At this point, it is necessary to test the model in a statistical sense , Including the significance test of the regression equation 、 Significance test of regression coefficient 、 Fitting optimization test 、 Heteroscedasticity test 、 Cut into collinearity test, etc . It also needs to be combined with the actual scene , Determine whether the model has practical significance .

  6. Model deployment application

    After passing the model test , Models can be used for related analysis 、 application , Including factor analysis 、 control 、 Forecast, etc .

Characteristics of regression model

Regression models are widely used in many fields , Has the following advantages :

  • Simple model , Modeling and application are relatively easy
  • Have solid statistical theory support
  • Quantitative analysis of the relationship between variables
  • The model prediction results can be accurately understood through error analysis

There are some disadvantages :

  • There are many and relatively strict assumptions ( There are too many assumptions , Verification is required before use , Verify that the conditions are met )
  • Variable selection has a great impact on the model ( There are many factors that affect the results , How to choose appropriate factors as independent variables )

summary

  1. Understand the characteristics and usage scenarios of linear regression
  2. Understand the modeling method of linear regression
    • Six steps , Whether the steps conform to the basic assumptions
  3. Understand the advantages and disadvantages of linear regression

Univariate linear regression

outline

  1. Parameter estimation of univariate linear regression
  2. Significance check of univariate linear regression
  3. Residual analysis of univariate linear regression
  4. Application of univariate linear regression model

The goal is

  1. Use MLE( maximum likelihood It is estimated that ),OLS( Ordinary least squares ) Parameter estimation
  2. Be able to use hypothesis test to test regression model
  3. Understand and be able to carry out residual analysis of regression model
  4. Use linear regression model to predict and control

One variable linear regression model

When studying a phenomenon , The relationship between the main concern and the main factors affecting the phenomenon , The two are closely related , But not one variable uniquely identifies another variable , You can use a univariate linear regression model .

Univariate linear regression equation y = β0 +β1x

The regression equation expresses the variable in the mean sense y And x Statistical regularity of .

The main task of regression analysis is through n Observed values of group samples , Yes β0、β1 Estimate , Get the final equation .

Parameter estimation : Least squares estimate

According to the observation data , Look for parameters β0、β1 The estimate of β0、β1, The sum of the squares of the deviations between the observed value and the regression predicted value is minimized , Estimated value β0、β1 Called regression parameter β0、β1 The least square estimation of .

Two sets of data are known x,y, Use the univariate linear regression model to fit the relationship between the two :y = β0 + β1x. The coefficients in the regression equation are estimated by least squares β0、β1, Get the final regression equation .

Parameter estimation : Maximum likelihood estimation

A method of calculating unknown parameter estimator by using the expression of population distribution density or probability distribution and the information provided by its samples .

The basic idea of maximum likelihood estimation : A known sample conforms to a certain distribution , But the specific parameters of the distribution are unknown , Through the experiment , Estimate the parameters of the distribution . The idea of estimation is : Knowing a certain set of parameters can maximize the probability of the current sample , I think u This parameter is the final estimate

Maximum likelihood estimation solves the problem of ” The model has been set , Unknown parameter “ The problem of . That is to say, the result of the continuous sample , To deduce the most likely value of the parameters in the given model .

Parameter estimation : Biased estimation and unbiased estimation

Unbiased estimate : An unbiased estimator for estimating population parameters using sample statistics , The mathematical expectation of the estimator is equal to the true value of the estimator . In other words , When estimating a quantity , For different samples , The estimated results are either too large or too small for the real value , Repeatedly ,” Average “ Come on , The deviation from the true value is 0. conversely , Biased estimation .

Unbiased estimation has no systematic bias , Biased estimates have systematic bias .

Significance test of regression model

Whether the regression coefficient is significant :t test

The dependent variable y And independent variables x Whether there is a linear relationship between , namely β1 Is it equal to 0, Use t The test is carried out Judge .

** Determine assumptions :** We collect data to find evidence of non-compliance , That is, the original assumption H0:β1 = 0, alternative hypothesis H1:β1 ≠0

Determine the inspection level : Take the usual α=0.05

Construct Statistics

Compare p Values and α value

Come to a conclusion :p If the value is greater than α value , The original hypothesis cannot be rejected . That is, the sample data obtained through this sampling , It does not prove that the original hypothesis is true . Need to re model .

原网站

版权声明
本文为[I'm afraid I'm not retarded]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202221440356569.html

随机推荐