当前位置：网站首页>Basic concepts of NLP 1

Basic concepts of NLP 1

2022-07-25 12:15:00 【bolite】

notes ： This note is from 《 Li Hongyi 2021/2022 Spring machine learning course 》p1-p4 What to learn

Reinforcement learning , Supervised learning , Unsupervised learning

Supervised learning

Supervised learning First by mark Correct Training set Training , After training “ Experience ” It's called a model , Then pass the unknown data into the model , The machine can pass through “ Experience ” Infer the correct result
Insert picture description here

Unsupervised learning

Unsupervised learning It's essentially a Statistical means ( It can also be understood as a means of classification ), It has no purposeful way of training , You can't know in advance what the result is , therefore No need to label . Its principle is similar to regression in supervised learning , But there is no label in the regression results .
Insert picture description here

Reinforcement learning

Reinforcement learning It refers to the generalization ability of the computer to make correct solutions to problems that have not been learned , It can be understood as Reinforcement learning = Supervised learning + Unsupervised learning . Like supervised learning , It also needs human intervention .
Insert picture description here

Two major tasks of machine learning

1.Regression（ Return to ） Is to find a function function, By inputting features x , Output a value Scalar .

Insert picture description here

2.Classification（ classification ） It is to let the machine choose a task as output among the options set by human beings

Insert picture description here

Find the functional formula

1. First assume that the function tries Y=b+wX1（X For input in training data ,Y by X Corresponding output ,b and w Is an unknown parameter , The formula is a guess, not necessarily right , The following data can be modified after training ）

2. Definition Loss function ：L（b,w) The parameter is the previous b and w,Loss The output of the function indicates that the b and w When set to this value , Is the corresponding accuracy good or bad

Loss How to get the function ： Can be x Input to a specific b and w Get the predicted y, Then the predicted y And the actual y The difference is absolutely worth it e, It will be all e Add to average .
Insert picture description here

3. Optimize

Insert picture description here

An optimization method —— Gradient descent method

Insert picture description here
My understanding is to beg Loss Function about w The slope of , When the slope is less than 0 When w Move forward , When the slope is less than 0 When w Just go back . Keep updating until you find the differential as 0 Or the number of initial setting updates reaches .
Insert picture description here
among , How much forward and backward depends on his differentiation and learning rate （ Red landau It means ,hyperparameter Set the value for yourself in the experiment , The learning rate here is set by ourselves ）

The model is more complex

Because most of the models are not similar to this form of univariate quadratic function , So for finding functional 3 There are also some steps that need to be changed .
Insert picture description here

We can split a complex function into several simple functions and a constant ： The red in the figure is the objective function , You can use a constant 0 and 3 Synthesis of blue functions with different labels .（ black b Is constant , Green bi yes sigmoid The parameters of the function ）
Insert picture description here
A curve similar to this radian , We can also take points on the curve , Then use the above method to get the result

About a single simple blue function

We can adjust constantly sigmoid function （ Activation function , It can also be represented by other activation functions ） Make him constantly change the corresponding single blue function
Insert picture description here

Corresponding to the above simple model, the result after change （x Is the previous variable , Training input data ）

Analyze the following formula

Insert picture description here

hypothesis i and j Only 1,2,3 Three data , Then the operation in blue brackets can be regarded as a matrix operation
Insert picture description here
And then r The value of is carried into sigmoid Function

Loss function Loss The variable of

Some changes need to be made on the input side of calculating the loss function
By the previous b,w Change to θ
θ Expressed as W, Green b,c, black b（ All unknown parameters ) One dimensional vector composed of vertical arrangement
The remaining calculation method is the same as the previous
Insert picture description here

neural network

When we get the first y The result of is passed into a new sigmoid Function , Form one or more new nested operations , It becomes a neural network . But it is not that the deeper the neural network is, the more accurate the result of the data is .
Insert picture description here