当前位置：网站首页>Pytorch learning -- using gradient descent method to realize univariate linear regression

Pytorch learning -- using gradient descent method to realize univariate linear regression

2022-07-24 11:09:00 【practical_ sharp】

Univariate linear regression

The one variable linear model is very simple , Suppose we have variables $x_i$ And the target $y_i$ , Every i Corresponds to a data point , Hope to build a model
Insert picture description here
$\hat{y}_i$ It's what we predicted , Hope to pass $\hat{y}_i$ To fit the target $y_i$ , Generally speaking, it is to find the function fitting $y_i$ To minimize the error , Is to minimize the

gradient

A gradient is mathematically a derivative , If it's a multivariate function , So the gradient is the partial derivative . Like a function f(x, y), that f The gradient is
$(\frac{\partial f}{\partial x},\ \frac{\partial f}{\partial y})$

Can be called grad f(x, y) perhaps $\nabla f(x, y)$ . Specific point $x_0,\ y_0)$ The gradient is $\nabla f(x_0,\ y_0)$ .

What's the point of gradients ？ Geometrically speaking , The gradient value of a point is where the function changes the fastest , say concretely , For the function f(x, y), At point $x_0, y_0)$ It's about , Along the gradient $\nabla f(x_0,\ y_0)$ The direction of , The function increases the fastest , That is, along the direction of the gradient , We can find the maximum point of the function faster , Or vice versa, in the opposite direction of the gradient , We can find the minimum point of the function faster .

Gradient descent method

With an understanding of gradients , We can understand the principle of gradient descent method . Above, we need to minimize this error , That is, we need to find the minimum point of this error , Then we can find the minimum point along the opposite direction of the gradient .

We can look at an intuitive explanation . Let's say we're somewhere on a mountain , Because we don't know how to get down the mountain , So I decided to go step by step , That is, every time you get to a position , Find the gradient of the current position , In the negative direction of the gradient , That is, the steepest position at present, take a step down , Then continue to solve the current position gradient , Take the steepest and easiest place to go down the mountain . Step by step , Until I feel that we have reached the foot of the mountain . Of course, go on like this , Maybe we can't go to the foot of the mountain , But to a certain part of the low peak .

Analogy to our problem , It's going in the opposite direction of the gradient , We keep changing w and b Value , Finally find the best group w and b To minimize the error .

At the time of the update , We need to decide the magnitude of each update , For example, in the case of going down the mountain , We need the length of each step down , This length is called the learning rate , use $\eta$ Express , This learning rate is very important , Different learning rates will lead to different results , Too little learning rate will lead to a very slow decline , Too much learning rate will lead to obvious beating .

Finally, our updated formula is
$\eta \frac{\partial f(w,\ b)}{\partial w} \\ b := b - \eta \frac{\partial f(w,\ b)}{\partial b}$

By constantly iterating and updating , Finally, we can find an optimal set of w and b, This is the principle of gradient descent method .

PyTorch Realize linear regression of one variable

Experimental data to be fitted

import torch
import numpy as np
from torch.autograd import Variable
import matplotlib.pyplot as plt

x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
                    [9.779], [6.182], [7.59], [2.167], [7.042],
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
                    [3.366], [2.596], [2.53], [1.221], [2.827],
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)


plt.plot(x_train, y_train, 'bo')
plt.show()

Display images ：
Insert picture description here

Model definition and initialization

#  convert to  Tensor
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)

#  Defining parameters  w  and  b
w = Variable(torch.randn(1), requires_grad=True)  #  Random initialization 
b = Variable(torch.zeros(1), requires_grad=True)  #  Use  0  To initialize 

#  Build a linear regression model 
x_train = Variable(x_train)
y_train = Variable(y_train)


def linear_model(x):
    return x * w + b


y_ = linear_model(x_train)

plt.plot(x_train.data.numpy(), y_train.data.numpy(), 'bo', label='real')
plt.plot(x_train.data.numpy(), y_.data.numpy(), 'ro', label='estimated')
plt.show()

Insert picture description here
At this time, we need to calculate our error function , That is to say

#  Calculation error 
def get_loss(y_, y):
    return torch.mean((y_ - y_train) ** 2)

loss = get_loss(y_, y_train)
#  Print it and have a look  loss  Size 
print(loss)

Insert picture description here

Define the error function , Next we need to calculate w and b The gradient of

#  Automatic derivation 
loss.backward()
#  see  w  and  b  Gradient of 
print(w.grad)
print(b.grad)

Insert picture description here

Update parameters once

#  Update parameters once 
w.data = w.data - 1e-2 * w.grad.data
b.data = b.data - 1e-2 * b.grad.data
y_ = linear_model(x_train)
plt.plot(x_train.data.numpy(), y_train.data.numpy(), 'bo', label='real')
plt.plot(x_train.data.numpy(), y_.data.numpy(), 'ro', label='estimated')
plt.show()

Insert picture description here
As you can see from the above example , After the update, the red line ran under the blue line , There is no particularly good fitting of the true value of blue , So we need to update several times

Loop update

for e in range(10):  #  Conduct  10  Secondary update 
    y_ = linear_model(x_train)
    loss = get_loss(y_, y_train)

    w.grad.zero_()  #  Remember to zero the gradient 
    b.grad.zero_()  #  Remember to zero the gradient 
    loss.backward()

    w.data = w.data - 1e-2 * w.grad.data  #  to update  w
    b.data = b.data - 1e-2 * b.grad.data  #  to update  b
    print('epoch: {}, loss: {}'.format(e, loss.item()))


y_ = linear_model(x_train)
plt.plot(x_train.data.numpy(), y_train.data.numpy(), 'bo', label='real')
plt.plot(x_train.data.numpy(), y_.data.numpy(), 'ro', label='estimated')
plt.show()

tensor(13.2937, grad_fn=<MeanBackward0>)
tensor([-47.0074])
tensor([-6.9313])
epoch: 0, loss: 0.46915704011917114
epoch: 1, loss: 0.23152294754981995
epoch: 2, loss: 0.22683243453502655
epoch: 3, loss: 0.22645452618598938
epoch: 4, loss: 0.22615781426429749
epoch: 5, loss: 0.2258642166852951
epoch: 6, loss: 0.22557203471660614
epoch: 7, loss: 0.22528137266635895
epoch: 8, loss: 0.22499223053455353
epoch: 9, loss: 0.2247045636177063