当前位置：网站首页>[note] linear regression

[note] linear regression

2022-07-23 16:35:00 【Sprite.Nym】

List of articles

One 、 summary
- 1.1 Linear regression
- 1.2 SKlearn Linear regression in
Two 、 Multiple linear regression

One 、 summary

1.1 Linear regression

Regression is a widely used prediction modeling technology , This kind of technology The core is that the predicted result is a continuous variable .

Decision tree , Random forests , The prediction label of classification algorithms such as support vector machine classifier is the classification variable , More than {0,1} To express , Unsupervised learning algorithms, such as PCA,KMeans Does not solve the label , Pay attention to the difference .

Regression algorithm comes from statistical theory , It may be one of the earliest machine learning algorithms , It is widely used in reality , Including the use of other Economic indicators predict the stock market index , According to the characteristics of jet flow Predict the precipitation in the region , According to the company's advertising expenses Forecast total sales , Or according to the residual carbon in organic matter -14 To estimate the age of fossils and so on , as long as All features based prediction of continuous variable demand , We all use regression techniques .

Since linear regression is derived from statistical analysis , It is an important algorithm combining machine learning and statistics . Generally speaking , We believe that statistics focuses on a priori , And machine learning values results , Therefore, in machine learning, collinearity and other factors that may affect the model will not be excluded for linear regression in advance , Instead, the model will be built first to see the effect . After the model is established , If the effect is not good , We will rule out the factors that may affect the model according to the guidance of Statistics . Our course will explain regression algorithms from the perspective of machine learning , If you want to understand statistics , Various statistics textbooks can meet your needs .

The mathematics of regression algorithm is relatively simple . Usually , There are two ways to understand linear regression ： The angle of matrix and the angle of Algebra . Almost all machine learning textbooks understand linear regression from the perspective of Algebra . Relative , There has been a lack of systematic use of matrices to interpret algorithms in our courses , So in this class , I will use the matrix method all the way ( The way of Linear Algebra ) Show you the face of returning to the big family .

After this class , You need to have a relatively comprehensive understanding of linear models , In particular, we need to master what advantages and problems exist in the linear model , And how to solve these problems .

1.2 SKlearn Linear regression in

sklearn The linear model module in is linear_model, We once mentioned this module when learning logical regression .linear_model Contains a variety of classes and functions , The classes and functions related to logistic regression are not listed here .

–	–
Ordinary linear regression
linear_model.LinearRegression	Linear regression using ordinary least squares

Ridge return
linear_model.Ridge	Ridge return , One will L2 Linear least squares regression as a regularization tool
linear_model.RidgeCV	Ridge regression with cross validation
linear_model.RidgeClassifier	Classifier of ridge regression
linear_model.RidgeClassifierCV	Classifier of ridge regression with cross validation
linear_model.ridge_regression	【 function 】 Use normal equation method to solve ridge regression

LASSO
linear_model.Lasso	Use L1 A linear regression model trained as a canonical tool
linear_model.LassoCV	With cross validation and regularization iteration path Lasso
linear_model.LassoLars	Using the minimum angle regression solution Lasso
linear_model.LassoLarsCV	With cross validation, it is solved by minimum angle regression Lasso
linear_model.LassoLarslC	Use BIC or AIC Model selection , Using the minimum angle regression solution Lasso
linear_model.MultiTaskLasso	Use L1/L2 Mixed norm as a regularization tool for training multiple labels Lasso
linear_model.MultiTaskLassoCV	Use L1/L2 The mixed norm is trained as a regularization tool , Multiple tags with cross validation Lasso

Two 、 Multiple linear regression

2.1 The basic principle of multiple linear regression

Linear regression is the simplest regression algorithm in machine learning , Multiple linear regression refers to a linear regression problem in which a sample has multiple characteristics . For one there is $n$ Sample of the features $i$ for , Its regression results can be written into a familiar equation ：

$\hat y= \omega_0 +\omega_1x_{i1}+\omega_2x_{i2}+...+\omega_nx_{in}$
Here's the formula $\omega$ It's the parameters of the model , $\omega_0$ Is the intercept term , $\omega_1$ ~ $\omega_n$ It's called the regression coefficient . Sometimes used $\beta$ perhaps $\theta$ Express . This formula is similar to our $y = a x + b$ The same nature . $y$ Is our target variable , Also known as , label . $x_{i1}$ ~ $x_{in}$ Is the sample $i$ On the characteristics of . Consider that we have $m$ Samples , Then our regression equation can be written as ：
$\hat y= \omega_0 +\omega_1x_{1}+\omega_2x_{2}+...+\omega_nx_{n}$
among $\hat y$ It includes m Column vector of regression results of all samples . Vectors are in bold . The above equation can be written in the following matrix form ：

Insert picture description here

Simplified version ： $\hat y=X\omega$

The task of linear regression , Is to construct a prediction function to map the input characteristic matrix X And tag values g The linear relationship of , This prediction function is written differently in different textbooks ： $f (x)$ or $h (x)$ It's possible . But anyway , The essence of this prediction function is the model we need to build , The core of constructing prediction function is to find out the parameter vector of the model . But how can we solve the parameter vector ?

2.2 Loss function

When we learn machine learning , We said we use training sets to train models , Modeling is to pursue the best performance of the model in the test set , Therefore, the evaluation index of the model is often used to measure the performance of the model on the test set . In linear regression , It is necessary to solve parameters based on training data , And hope that the trained model can fit the training data as much as possible , That is, the prediction accuracy of the model on the training set is closer to 100% The better .

therefore , We use **" Loss function " This evaluation index , Come on The measurement parameter is w The size of information loss caused by the model fitting training set , And measure the parameters w The advantages and disadvantages of **. If you model with a set of parameters , The model performs well on the training set , Then we say that the loss in the process of model fitting is very small , The value of the loss function is very small , This set of parameters is excellent ; contrary , If the model performs badly on the training set , The loss function will be very large , The model is not trained enough , Poor results , This set of parameters is relatively poor .

So we hope to solve the parameters u when , The loss function is the smallest . In this way, the fitting effect of the model on the training data will be the best . The accuracy will be as high as possible . So the way we solve the parameters , Is the process of optimizing the loss function .

Be careful ： Some models do not need to solve parameters , There is no loss function , such as KNN, Decision tree . Loss function of linear regression :

Loss function of linear regression ：

$\displaystyle\sum_{i=1}^{m}(y_i-\hat y_i)^2=\displaystyle\sum_{i=1}^{m}(y_i-X_i\omega)^2$

$y_i$ Is the sample $i$ The corresponding real label , $\hat y_i$ Is the sample $i$ In a certain set of parameters $\omega$ Forecast tab under .

First , This loss function represents the vector $y_i-\hat y_i$ Of L2 The square result of the normal form ,L2 The essence of paradigm is European distance , That is, each point on the two vectors corresponds to the peace and square again after meeting , Here we only do the sum of squares , No more prescriptions .

therefore , We get the loss function ：L2 normal form , Sum of squares of Euclidean distances .

Under this square result , our g and g They are our real labels and predicted values , in other words , This loss function actually calculates the distance between our real label and the predicted value . therefore , We believe that this loss function measures the difference between the prediction results of our model and the real label , Therefore, we certainly hope that the smaller the difference between our prediction results and the real value, the better . So our goal can be transformed into ：

$\displaystyle \min_{\omega}{||y-X\omega||_2}^2$

2.3 Least square method

Now the problem turns into solving let RSS Minimized parameter vector $\omega$ , By minimizing the difference between the real value and the predicted value RSS The method to solve the parameters is called the least square method . The first step in solving the extreme value is often to solve the first derivative and make the first derivative equal to 0, The least square method is not immune to vulgarity . therefore , We now sum of squares of residuals RSS On the parameter vector $\omega$ Derivation .

Derivation process .

2.4 linear_model.LinearRegression

class sklearn.linear_mode1.LinearRegression(fit_intercept=True, normalize=False, copy_x=True, n_jobs=None)

Parameters	meaning
fit_intercept	Boolean value , Not required , The default is True. Whether to calculate the intercept of this model . If set to False, Intercept... Is not calculated .
normalize	Boolean value , Not required , The default is False, When fit_intercept Set to False when , This parameter will be ignored . If True, Then the characteristic matrix X The mean value will be subtracted before entering the regression （ Centralization ） And divide by L2 normal form ( The zoom ). If you want to standardize , Please be there. fit Data before use preprocessing Standardized special classes in modules StandardScaler.
copy_x	Boolean value , Not required , The default is True If it is true , Will be in X.copy() Operation on top , Otherwise, the original characteristic matrix X May be affected by linear regression and cover integers or None, Not required , The default is None. Number of jobs used for calculation . Only in the return of multiple labels .
n_jobs	And when the data volume is large enough . Unless None stay joblib.parallel_backend In the context of , otherwise None Unity is expressed as 1. If input -1, It means to use all CPU To calculate .

The class of linear regression is probably the simplest class we have learned so far , Only four parameters can complete a complete algorithm . And you can see , None of the these parameters is required , There are no irreplaceable parameters for our model . This explanation , Performance of linear regression , It often depends on the data itself , Not our ability to tune parameters , Therefore, linear regression has high requirements for data . Fortunately, , In reality, most continuous variables , You can see more or less velvet connections . So linear regression is simple , But it's powerful .

By the way ,sklearn Linear regression in can deal with multi label problems , Only need fit You can enter a multi-dimensional label when .

原网站

版权声明
本文为[Sprite.Nym]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/204/202207231241353138.html