当前位置：网站首页>2022 Tsinghua summer school notes L2_ 2 basic introduction of CNN and RNN

2022 Tsinghua summer school notes L2_ 2 basic introduction of CNN and RNN

2022-07-24 21:33:00 【The duck neck is gone】

2022 Tsinghua University large model cross Seminar

L2 Neural Network basics

1 Recurrent Neural Networks

Cyclic neural network RNNs

1.1 RNN Basics

Sequence data Sequential memory （ The brain is easier to recognize ）

The input is usually data of variable length ,h Variables for different time steps ,y For export
RNN Structural unit

Insert picture description here
Sequential memory ： Every time step hi It is the content of the previous hidden state ,h0 It needs to be initialized by itself .

application ：
- Sequence tags （ Label properties ）
- Sequence prediction （ Forecast the weather ）
- Image description （ A given picture generates a sentence ）
- Text classification （ Distinguish emotions ）
advantage
- Can handle any length of input
- The size of the model will not change due to the length of the input
- Parameters of the Shared
- Later calculations will use the information from the previous steps
shortcoming
- Time is slow
- In fact, the latter calculation is difficult to use the previous data
- The gradient disappears （ Often ） Or gradient explosion

1.2 RNN variant

Ideas ： Optimization unit , Complicate hidden layers

1.2.1 GRU（Gated Recurrent Unit)

Introduce the gating mechanism RNN, Weigh past information against current input . By observing the formula, we find , Here W Are exclusive weights
Insert picture description here

Reset door
$\tilde{h}_{i}=\tanh \left(W_{x} x_{i}+r_{i} * W_{h} h_{i-1}+b\right)$
Considering the current activation of the state of the previous layer , We can get a temporary hi.
If our ri Close to the 0 Words , We will find that hi And the last one hi Our relationship is weak .
Update door
$h_{i}=z_{i} * h_{i-1}+\left(1-z_{i}\right) * \tilde{h}_{i}$
Weigh the new hi and hi-1 The impact between , So as to get the... Transmitted to the next layer hi.
When zi Close to the 1 When ,hi and hi-1 Completely equal ; When zi Close to the 0 When , We can directly adopt the activated new hi.
demonstration

Benefits of gating mechanism ： Can control the relevance of different places （ Quickly establish distant relationships ）; Reduce the quantity

1.2.2 LSTM（Long Short-Term Memory）

Ct
- I added a new one Ct Express cell The state of , To learn long-term dependencies .
Forget gate ft Oblivion gate

$f_{t}=\sigma\left(W_{f} \cdot\left[h_{t-1}, x_{t}\right]+b_{f}\right)$
Decide which information in the previous state can be obtained from cell Remove . Calculation method ： The current state and the state of the previous hidden layer . final ft by 0-1 Within the interval . If 0, Indicates that the past information is directly discarded .
Input gate

Decide what information can be stored cell In state
- it： Enter the door parameters
  $i_{t}=\sigma\left(W_{i} \cdot\left[h_{t-1}, x_{t}\right]+b_{i}\right)$
- $\tilde{C}_{t}$ For the candidate ct Variable
  $\tilde{C}_{t}=\tanh \left(W_{C} \cdot\left[h_{t-1}, x_{t}\right]+b_{C}\right)$
to update cell state
- $C_{t}=f_{t} * C_{t-1}+i_{t} * \tilde{C}_{t}$
- First update the old cell state： Multiply the forgetting door by one floor cell state, To decide what information needs to be discarded
- Multiply the input gate by the new vector to be selected , To decide what information needs to be added to the next layer cell state.
Output gate
The output gate determines which information can be output
$o_{t}=\sigma\left(W_{o}\left[h_{t-1}, x_{t}\right]+b_{o}\right)$
$h_{t}=o_{t} * \tanh \left(C_{t}\right)$
( It can be understood as adjusting some information to the expression of words ）

1.2.3 Bidirectional RNNs（ two-way RNN）

Preface
- In tradition RNN Tasks , Each hidden state variable is determined from the state variable in the previous time and the current input
- But in many applications, we will rely on the whole input sequence
- Example ： Handwriting / speech recognition
figure

2 Convolutional Neural Networks

Convolutional neural networks CNNs

2.1 Preface

application ：
- CV Mission
- NLP Mission ： Emotional categories 、 Relationship classification
Good at extracting local and location invariant patterns
- CV： Color 、 Corner 、 Texture, etc
- NLP： Phrases and grammatical structures
To extract a pattern ：
- Calculate all possible in a sentence n-gram It means
- Do not rely on external linguistic tools （ Such as dependency resolution ）

2.2 CNN Structure example

structure ： Input layer 、 Convolution layer （filter）、 Maximum pool layer （ The extracted features ）、 The nonlinear layer

2.2.1 Input layer

Pass the word word embedding Convert to vector representation .
Insert picture description here
X Express matrix for word vector ,m Here is 6,d Let's continue the dimension selected when word vector representation .