当前位置:网站首页>Regression analysis - basic content
Regression analysis - basic content
2022-06-21 10:33:00 【I'm afraid I'm not retarded】
regression analysis
Between variables X,Y There is a close connection between , But it is not a strict functional relationship ( Non deterministic relationship )
Return to : Regression is a statistical method and technique to deal with the quantitative relationship between two or more variables , The relationship between variables is not a definite functional relationship , Describe by a certain probability distribution
Classification of regression
Linear and nonlinear
The strict definition of linearity is a mapping relation , The mapping relationship satisfies additivity and secondarily . Popular understanding is two variables ( Dependent and independent variables ) There is a power function relationship between , It appears as a straight line in the plane coordinate system .
Non linearity is non-linearity .
Linear regression
Linear regression : In regression analysis , If there is a linear relationship between the independent variable and the dependent variable , Is called linear regression .
If there is only one dependent variable and one independent variable , It is called univariate linear regression , If a dependent variable has more than one independent variable , Is called multiple regression
The regression model
General form of regression model :y = f(x1,x2,x3,...,xp) + E
f(x1,x2,x3,...,xp) Determinism
E Random error ( Disturbing item ):1. Lack of influencing factors ,2. observation / Measurement error ,3. Other random errors
The process of establishing regression model
Demand analysis identifies variables
Understand the actual needs , Define the scene , Clear the indicators to be explained ( The dependent variable ), According to the relevant business knowledge, the relevant variables are selected as explanatory variables ( The independent variables ).
Data processing
According to the explanatory variables obtained from the previous analysis , To collect relevant data ( Timing data 、 Section data, etc ), Clean the obtained data 、 machining , And adjust the explanatory variables according to the data , And judge whether the basic assumptions are met
Check whether the data conditions meet the basic assumptions and explanatory variables :
- Explanatory variables are non random variables , The observed value is constant
- There is no exact linear relationship between explanatory variables
- The number of samples is more than the number of explanatory variables
- Random error : Zero mean 、 Homovariance 、 Unrelated 、 normal
Deterministic regression model
Learn about datasets , Use the drawing tool to draw the scatter diagram of variable samples or use other analysis tools to analyze the relationship between variables , Select the regression model according to the results , Such as : linear regression model , Exponential regression model, etc .
Estimation of model parameters
After the model is determined , Based on collection 、 Collated sample data , Estimate the relevant parameters in the model . The most common method is the least square method , In the case of not meeting the basic assumptions, ridge regression will be adopted 、 Principal component regression 、 Partial least squares, etc .
- Least square method : Also called least square method , A method of finding the best function matching of data by minimizing the sum of squares of errors .
Model checking and optimization
After the parameters are determined , Get the model . At this point, it is necessary to test the model in a statistical sense , Including the significance test of the regression equation 、 Significance test of regression coefficient 、 Fitting optimization test 、 Heteroscedasticity test 、 Cut into collinearity test, etc . It also needs to be combined with the actual scene , Determine whether the model has practical significance .
Model deployment application
After passing the model test , Models can be used for related analysis 、 application , Including factor analysis 、 control 、 Forecast, etc .
Characteristics of regression model
Regression models are widely used in many fields , Has the following advantages :
- Simple model , Modeling and application are relatively easy
- Have solid statistical theory support
- Quantitative analysis of the relationship between variables
- The model prediction results can be accurately understood through error analysis
There are some disadvantages :
- There are many and relatively strict assumptions ( There are too many assumptions , Verification is required before use , Verify that the conditions are met )
- Variable selection has a great impact on the model ( There are many factors that affect the results , How to choose appropriate factors as independent variables )
summary
- Understand the characteristics and usage scenarios of linear regression
- Understand the modeling method of linear regression
- Six steps , Whether the steps conform to the basic assumptions
- Understand the advantages and disadvantages of linear regression
Univariate linear regression
outline
- Parameter estimation of univariate linear regression
- Significance check of univariate linear regression
- Residual analysis of univariate linear regression
- Application of univariate linear regression model
The goal is
- Use MLE( maximum likelihood It is estimated that ),OLS( Ordinary least squares ) Parameter estimation
- Be able to use hypothesis test to test regression model
- Understand and be able to carry out residual analysis of regression model
- Use linear regression model to predict and control
One variable linear regression model
When studying a phenomenon , The relationship between the main concern and the main factors affecting the phenomenon , The two are closely related , But not one variable uniquely identifies another variable , You can use a univariate linear regression model .
Univariate linear regression equation y = β0 +β1x
The regression equation expresses the variable in the mean sense y And x Statistical regularity of .
The main task of regression analysis is through n Observed values of group samples , Yes β0、β1 Estimate , Get the final equation .
Parameter estimation : Least squares estimate
According to the observation data , Look for parameters β0、β1 The estimate of β0、β1, The sum of the squares of the deviations between the observed value and the regression predicted value is minimized , Estimated value β0、β1 Called regression parameter β0、β1 The least square estimation of .
Two sets of data are known x,y, Use the univariate linear regression model to fit the relationship between the two :y = β0 + β1x. The coefficients in the regression equation are estimated by least squares β0、β1, Get the final regression equation .
Parameter estimation : Maximum likelihood estimation
A method of calculating unknown parameter estimator by using the expression of population distribution density or probability distribution and the information provided by its samples .
The basic idea of maximum likelihood estimation : A known sample conforms to a certain distribution , But the specific parameters of the distribution are unknown , Through the experiment , Estimate the parameters of the distribution . The idea of estimation is : Knowing a certain set of parameters can maximize the probability of the current sample , I think u This parameter is the final estimate
Maximum likelihood estimation solves the problem of ” The model has been set , Unknown parameter “ The problem of . That is to say, the result of the continuous sample , To deduce the most likely value of the parameters in the given model .
Parameter estimation : Biased estimation and unbiased estimation
Unbiased estimate : An unbiased estimator for estimating population parameters using sample statistics , The mathematical expectation of the estimator is equal to the true value of the estimator . In other words , When estimating a quantity , For different samples , The estimated results are either too large or too small for the real value , Repeatedly ,” Average “ Come on , The deviation from the true value is 0. conversely , Biased estimation .
Unbiased estimation has no systematic bias , Biased estimates have systematic bias .
Significance test of regression model
Whether the regression coefficient is significant :t test
The dependent variable y And independent variables x Whether there is a linear relationship between , namely β1 Is it equal to 0, Use t The test is carried out Judge .
** Determine assumptions :** We collect data to find evidence of non-compliance , That is, the original assumption H0:β1 = 0, alternative hypothesis H1:β1 ≠0
Determine the inspection level : Take the usual α=0.05
Construct Statistics
Compare p Values and α value
Come to a conclusion :p If the value is greater than α value , The original hypothesis cannot be rejected . That is, the sample data obtained through this sampling , It does not prove that the original hypothesis is true . Need to re model .
边栏推荐
- 燎原之势 阿里云数据库“百城聚力”助中小企业数智化转型
- Concurrency - condition variable
- China international e-commerce center and Analysys jointly released: the national online retail development index in the fourth quarter of 2021 increased by 0.6% year on year
- Vuforia引擎支持的版本
- 123. deep and shallow copy of JS implementation -- code text explanation
- 使用shapeit进行单倍型分析
- Equals and hashcode
- DSP online upgrade (4) -- functions implemented by bootloader
- Do website from scratch 11- blog development
- 如何做一个有趣的人
猜你喜欢

中国国际电子商务中心与易观分析联合发布:2021年4季度全国网络零售发展指数同比增长0.6%

东方甄选双语直播火爆出圈,新东方转型初见端倪

121. Redux detailed summary + effect drawing + Case
![触摸按键控制器TTP229-BSF使用心得[原创cnblogs.com/helesheng]](/img/2f/3594188c5e58d3501f76f4937a979c.png)
触摸按键控制器TTP229-BSF使用心得[原创cnblogs.com/helesheng]

性能优化——图片压缩、加载和格式选择

TC软件概要设计文档(手机群控)

还在直接用localStorage么?全网最细:本地存储二次封装(含加密、解密、过期处理)

111. solve the problem of prohibiting scripts from running on vs code. For more information, see error reporting

The backbone of the top 100 security companies! Meichuang technology was selected into the 2022 China top 100 Digital Security Report

Talk about the multimodal project of fire
随机推荐
Optional classes, convenience functions, creating options, optional object operations, and optional streams
How to be an interesting person
optional类,便利函数,创建Optional,Optional对象操作以及Optional流
The memory allocation of the program, the storage of local const and global const in the system memory, and the perception of pointers~
并发编程高级部分:并行流,Tasks和Executors以及CompletableFuture类
Tensorflow, danger! Google itself is the one who abandoned it
如何选择嵌入式练手项目、嵌入式开源项目大全
How to convert mindspire model to onnx format and use onnxruntime reasoning - development test
Es composite query workload evaluation
Starting a prairie fire, Alibaba cloud database "hundred cities gather together" to help small and medium-sized enterprises' digital intelligence transformation
Eureka's timedsupersortask class (periodic task with automatic interval adjustment)
Quick sorting, simple and easy to understand principle description, with C code implementation,
DSP online upgrade (3) -- how to burn two projects in the on-chip flash of a DSP chip
Audio and video synchronization knowledge points you must pay attention to:
118. summary of basic knowledge of typescript (data type, interface, abstract class, inheritance, attribute encapsulation, modifier)
Mqtt of NLog custom target
Introduction to ground plane in unity
香农的信息论究竟牛在哪里?
The more AI evolves, the more it resembles the human brain! Meta found the "prefrontal cortex" of the machine. AI scholars and neuroscientists were surprised
Solve the problem of error when typescript object gets value