当前位置：网站首页>Machine learning note 7: powerful neural network representation

Machine learning note 7: powerful neural network representation

2022-06-22 05:43:00 【Amyniez】

Catalog

1. Neural networks （ Great AI ）

1.1 Question elicitation

The disadvantages of linear regression and logical regression ： When there are too many eigenvalues , The computational load will be very large , Resulting in reduced computational efficiency . therefore , Neural network will solve this problem perfectly .
for example ： Yes 100 Eigenvalues , Need to hope to use this 100 A feature to construct a nonlinear polynomial model , The result will be an amazing number of feature combinations , Even if we only use a combination of two features ： $x_1x_2+x_2x_3+x_3x_4+......+x_{99}x_{100}$ , There will be 5000 A combination of features , There are too many characteristics to calculate for logistic regression .

1.2 e.g. Visual object recognition model （ Identify the objects in the picture ）

Use many pictures of cars and many pictures of non cars , Then use the pixel value on the picture （ Saturation or brightness ） As a feature . Only grayscale images are selected （ Not RGB）, Each pixel has only one value , Then select two pixels at two different positions on the picture , Then train a logistic regression algorithm , Use this Two pixel values To determine whether the picture is a car ：

use 50x50 A small picture of pixels , And all pixels are considered as features , Will have a
2500 Features , If further, it will Pairwise feature combination Form a polynomial model , There will be an appointment $2500^2/2$ Features . Ordinary logistic regression model , Can't handle so many features effectively . therefore , You need to use neural networks .

1.3 What is neural network ？

The birth of neural network is that people want to try to design algorithms that imitate the brain （ The human brain is the best learning machine ）. hypothesis ： The brain does everything and in different ways , You don't need thousands of different programs to implement . contrary , The way the brain processes , Just need a single learning algorithm . Because the human body has The same brain tissue Can handle light 、 Acoustic or tactile signals , Then there may be A learning algorithm （ Instead of thousands of algorithms ）, You can deal with vision at the same time 、 Hearing and touch .
Insert picture description here

2. Model to represent

The neural network model is based on many neurons , Each neuron is a learning model . These neurons （ Also called activation unit ,activation unit） Take some features as output , And provide an output based on its own model . Such as a neural network similar to neurons ：

$x_1, x_2, x_3$ It's the input unit , Input raw data into them .
$a_1, a_2, a_3$ It's the intermediate unit , Responsible for data processing , And then to the next level .
Finally, the output unit , It's responsible for calculating $h_\theta(x)$ .
Neural network model is a network of many logic units organized according to different levels , The output variables of each layer are the input variables of the next layer . The first layer acts as the input layer （Input Layer）, The middle layer becomes the hidden layer （Hidden Layers）, The last layer is called the output layer （Output Layer）. And add a deviation unit for each layer （bias unit）：

Model is introduced ：
$a_i^{(j)}$ ： On behalf of the 𝑗 Layer of the first 𝑖 Two activation units .
$\theta(j)$ ： From the first 𝑗 Layers are mapped to 𝑗 + 1 In the case of layer Weight matrices , for example ,𝜃(1) A matrix representing the weights mapped from the first layer to the second layer . Its size is ： By the end of 𝑗 + 1 The number of active cells in a layer is the number of rows , By the end of 𝑗 A matrix in which the number of active cells of a layer plus one is the number of columns . for example , In the neural network in the figure above $\theta(1)$ The size is 3 * 4.
The activation unit and output are expressed as ：
Weight matrices ：

therefore , every last 𝑎 It's all owned by the upper floor 𝑥 And each one 𝑥 The corresponding decision , So this left to right algorithm is called Forward propagation algorithm ( FORWARD PROPAGATION )
hold 𝑥, 𝜃, 𝑎 Each is represented by a matrix , We can get 𝜃 ⋅ 𝑋 = 𝑎 ：

3. Vectorization calculation of forward propagation algorithm

First , Calculation of the second layer of neural network ：
Insert picture description here

Re order 𝑧(2) = $\theta^{(1)}$ 𝑥, be 𝑎(2) = 𝑔( $𝑧^{(2)}$ ) , Add... After calculation $𝑎_0^{(2)} = 1$ ：

Insert picture description here
Forward propagation algorithm ：

Insert picture description here

All in all , $a^n=g(\theta^{n-1}a^{n-1})$ ,n： It means the first one n layer .

Be careful ： If you want to calculate the whole training set , The training set characteristic matrix needs to be Transposition , Make the features of the same instance in the same column . Such as ：

Insert picture description here

4. Characteristics of neural network

4.1 Comparison of eigenvalues between logistic regression and neural network

Neural network can learn its own series of New features ：

stay Logical regression in , The model is limited to the original features X in , Although binomial terms can be used to combine these features （ Such as , A combination of two ）, But the model is still limited by the original features .
stay neural network in , The original feature is just the input layer , The prediction made by the output layer uses the characteristics of the second layer , Not the original features in the input layer , It can be said that Features in the second layer It is a series of neural networks that are used to predict output variables after learning New features .

4.2 Logic operation in neural network

Monolayer neurons （ No middle layer ） The calculation of can be used to express Logical operations （AND、OR）

Logic and (AND)：

If we assume $\theta_0=-30,\theta_1=20,\theta_2=20$ , Then the output function is ： $h_\theta^{（x）}=g(-30+20x_1+20x_2)$
namely z Being positive , It is true ;z Negative , False
To sum up, we can get ,AND function ： $h_\theta^{（x）}$ ≈ $x_1 AND x_2$
Logic or （OR）：

hypothesis $\theta_0=-10,\theta_1=20,\theta_2=20$

5. Samples of neural networks

When the input characteristic is Boolean （0 or 1） when , A single activation layer can be used as a logical operator , To represent different operators , Just choose Different weights .

Logic and （AND）： Both numbers are 1, The result is 1;
Logic or （OR）： As long as one number is 1, The results for 1;
Logic is not （XOR）： The difference between the two numbers is 1, Same as 0;
Logical identical or （XNOR）： Same as 1, Different for 0; namely $XNOR=(x_1 AND x_2)OR((NOT x_1)AND(NOT x_2))$
First construct the expression $NOT x_1)AND(NOT x_2))$ Partial neurons ：
XNOR Operator implementation ：

Andrew NG Pictures given ：

In this way, we can gradually construct more and more complex functions , You can also get more powerful eigenvalues . This is the power of neural networks ！

6. Multi classification in neural networks

for example , Train a neural network algorithm to recognize passers-by （pedestrian）、 automobile （car）、 The motorcycle （motorcycle） And trucks （truck）, In the output layer we should have 4 It's worth . for example , The first value is 1 or 0 Used to predict whether a pedestrian , The second value is used to determine whether it is a car .
Input vector 𝑥 There are three dimensions , Two intermediate layers , Output layer 4 Two neurons are used to represent 4 class , That is, every data will appear in the output layer [𝑎 𝑏 𝑐 𝑑]𝑇, And 𝑎, 𝑏, 𝑐, 𝑑 Only one of them is for 1, Represents the current class . Here's what to do The structure of neural networks ：

Insert picture description here
The output of neural network algorithm is one of four possible cases ：