当前位置：网站首页>Day 5 of DL

Day 5 of DL

2022-07-16 07:51:00 【The sun is falling】

How to understand back propagation ？ Explain how it works ？

This problem is designed to test your knowledge of how neural networks work . You need to make the following points clear ：

Forward process ( Forward calculation ) It is a process to help the model calculate the weight of each layer , The resulting calculation will produce a yp result . The value of the loss function is calculated , The value of the loss function will show how good the model is . If the loss function is not good enough , We need to find a way to reduce the value of the loss function . In essence, training neural network is to minimize the loss function . Loss function L (yp, yt) Express yp The output value of the model and yt The difference between the actual values of data labels .
To reduce the value of the loss function , We need to use derivatives . Back propagation helps us calculate the derivative of each layer of the network . According to the derivative value on each layer , Optimizer (Adam、SGD、AdaDelta…) Apply gradient descent to update the weight of the network .
Back propagation uses chain rules or derivative functions to calculate the gradient value of each layer from the last layer to the first layer .

What happens when the learning rate is too high or too low ？

When the learning rate of the model is set too low , Model training will be very slow , Because it makes very small updates to the weights . It needs to be updated many times before reaching the local best .
If the set learning rate is too high , Because the weight update is too large , The model may not converge . It's possible in a step of updating weights , The model jumps out of local optimization , Make it difficult to update the model to the best in the future , But in the local optimization point near jump around .

When the image size changes to 2 times ,CNN How many times the number of parameters ？ Why?

CNN The number of model parameters depends on the number and size of filters , Instead of inputting images . therefore , Doubling the size of the image does not change the number of parameters in the model .

explain bias and Variance The trade-off between

What is? bias？ You can understand ,bias It's the difference between the average forecast of the current model and the actual result we need to forecast . A tall one bias Our model shows that it pays less attention to training data . This makes the model too simple , In the training and testing did not achieve good accuracy . This phenomenon is also called Under fitting .

Variance It can be simply understood as the distribution of model output on a data point .Variance The bigger it is , The more likely the model is to pay close attention to the training data , Instead of providing generalization of data that has never been encountered . therefore , This model has achieved very good results in the training data set , But compared to the test data set , It turned out to be very bad , This is it. Over fitting The phenomenon of .
The relationship between these two concepts can be seen in the figure below ：
Insert picture description here
In the diagram above , The center of the circle is a model , It perfectly predicts the exact value . in fact , You've never found such a good model . As we get further and further away from the center of the circle , Our prediction is getting worse .
We can change the model , In this way, we can increase the number of model guesses falling into the center of the circle as much as possible . We need to balance the deviation value with the variance value . If our model is too simple , Few parameters , Then it may have high bias and low variance .
On the other hand , If our model has a lot of parameters , Then it will have high square error and low deviation . This is the basis of calculating the complexity of the model when we design the algorithm .