当前位置:网站首页>Structured machine learning project (II) - machine learning strategy (2)
Structured machine learning project (II) - machine learning strategy (2)
2022-06-27 22:30:00 【997and】
This study note mainly records various records during in-depth study , Including teacher Wu Enda's video learning 、 Flower Book . The author's ability is limited , If there are errors, etc , Please contact us for modification , Thank you very much !
@[TOC]( Structured machine learning project ( Two )- Machine learning strategies (2))
The first edition 2022-06-01 first draft
One 、 Conduct error analysis (Carrying out error analysis)
Debug cat classifier ,90% Accuracy rate .
As shown in the figure, the two dogs are wrongly analyzed , It can be aimed at dogs , Collect more dog graphs or design algorithms to deal with dogs
Recommended :
First , Collect , Such as 100 Wrong samples , Manual check . Adding human data to machine learning is not good .
Sometimes when doing error analysis , Several ideas can be evaluated in parallel at the same time . Mark error , Half way through, you may find that the filter interferes with the classifier .
Two 、 Clear mislabeled data (Cleaning up Incorrectly labeled data)
Pictured , The penultimate one is marked wrong .
The deep learning algorithm is quite robust to random errors in the training set (robust), But not so robust to systematic errors .
Whether it is worth revising 6% Mark samples with errors .
First , No matter what correction means , Both should be applied to the development set and the test set at the same time , The two must come from the same distribution .
secondly , Consider simultaneously testing the samples with correct judgment and wrong interpretation ,
Last , You may decide to fix only the development set and the test set , They are relatively small .
3、 ... and 、 Quickly build your first system , And iterate (Build your first system quickly,then iterate)
Speech recognition in many noisy examples .
Four 、 Use data from different distributions for training and testing (Training and testing on different distributions)
A data source comes from a mobile phone , There are different fuzzy ; Another source of data is the crawler .
The purpose of setting up development and is to tell the team what to aim at .
The first option focuses mostly on optimizing the images downloaded from the web , Don't suggest ;
The second option training set is web download 200000 A picture , Plus 5000 Photos uploaded from mobile phone . Both the development set and the test set are mobile phones . Better performance over time .
All voice data can be used as training set .
5、 ... and 、 Analysis of deviation and variance when data distribution does not match (Bias and Variance with mismatched data distributions)
First, the algorithm only sees the training set data , Never seen development set data ; second , Development set data comes from different distributions .
Build development sets and test sets from the same distribution , But the training sets come from different distributions . All you have to do is randomly break up the training set , Set aside a portion for training - Development set .
On the lower right are three examples , The second high deviation , Under fitting .
Open to avoid deviation 、 variance 、 The data doesn't match 、 Over fitting the development set ( If the gap is large, it will be over fitting ).
Example on the right , It performs better in test set and development set .
The horizontal axis : General speech recognition gets data 、 Collect different data sets such as voice data related to rear-view mirrors
Vertical axis : Different ways or algorithms of processing data - Human level 、 The error rate achieved on a neural network trained or untrained data set .
How to deal with data mismatch ?
Especially from different distributions , You can use more training data .2.6 I'll talk about .
6、 ... and 、 Dealing with data mismatches (Addressing data mismatch)
To avoid over fitting the test set , Do error analysis , Look at the development set, not the test set .
1. Find out how the development set differs from the training set
2. Make the training set more like a development set
The training set is close to the development set , You can synthesize data .
Think there is a data mismatch problem , It can be used for data analysis .
7、 ... and 、 The migration study (Transfer learning)
Sometimes , Neural networks can acquire knowledge from a task , And apply these to just another independent task .
( In blue ) say concretely , During the first stage of training , When you train for image recognition tasks , You can train all the common parameters of the neural network , All the weights , All layers , Then you get a network that can do image recognition and prediction . After training the neural network , To realize transfer learning , What you have to do now is , Replace the data set with a new one (x,y) Yes , Now these are radiology images , and y Is the diagnosis you want to predict , What you need to do is initialize the weight of the last layer , Let's call it w[L] and b[L] Random initialization .
( Purple ) Training data on a new radiology dataset :
The data set is small , Retrain the last layer of weights , And keep other parameters unchanged .
Enough data , Retrain all remaining layers in the neural network . The initial training is pre training (pre-training)、 Update all weights , Then the training process on the radiology data is called fine tuning (fine tuning)
( Next ) Here is another example , Suppose you've trained a speech recognition system , Now? x Is audio or audio clip input , and y It's dictation , So you've trained your speech recognition system , Let it output dictation text . Now let's say you want to build a “ Rousing words 〞 or “ Trigger word ” Detection system , The so-called awakening word or trigger word is a sentence we say , It can wake up the voice control equipment at home , Like you said "Alexa” can To wake up an Amazon Echo equipment , Or use “OK Google〞 Wake up the Google equipment , use "Hey siri" To wake up Apple Devices , use " Hello baidu " Wake up a Baidu device . To do that , You may need to remove the last layer of the neural network , Then add a new output node , But sometimes you can add more than one new node , Or even add a few new layers to your neural network , Then put the wake-up word to detect the tag of the problem y Feed in and train . Again , It depends on how much data you have , You may just need to retrain the new layer of the network , Maybe you need to retrain more layers of the neural network .
( Shanghong ) The first image training is 100 All samples , You can learn low-level features . Radiology training has 100 Samples , So a lot of knowledge learned from image recognition training can be transferred , Even though the radiology department has little data .
When transfer learning makes sense ?
1. Want to start from the task A Learn and transfer some knowledge to the task B, When A and B There's the same input x Time makes sense ;
2. Mission A Data is better than task B Much more , meaningful ;
3.A The low-level features of can help the task B
8、 ... and 、 Multi task learning (Multi-task learning)
Transfer learning , The steps are serial .
Driverless cars detect pedestrians at the same time 、 vehicle 、 Stop sign 、 traffic lights
Define the loss function of neural network ,softmax Regression assigns a single label to a single sample , This diagram can have many different labels , Multiple objects may appear in the same picture at the same time , Instead, you iterate through different types .
Multi task learning , Four different neural networks can also be trained . Even if some images have only a small number of labels ,
When does multitasking make sense ?
1. If training a group of tasks , Low level features can be shared 6
2. Not absolutely : The amount of data in each task is very close .
3. When training a large enough neural network , Do all the work at the same time . So an alternative to multi task learning is to train a separate neural network for each task . The only case of performance degradation is that the neural network is not large enough .
Transfer learning is frequently used , The data set is relatively small , Transfer learning can help you .
Nine 、 What is the end-to-end in-depth learning (What is end-to-end deep learning)
Previous data processing systems or learning systems , Multiple stages of processing are required , So end-to-end deep learning is to ignore all these different stages , Replace it with a single neural network .
For example, speech recognition , First you will extract some features , Some hand designed audio features , Maybe you've heard MFCC, This algorithm is used to extract a specific set of artificially designed features from audio . After extracting some low-level features , You can use machine learning algorithms to find phonemes in audio clips , So phoneme is the basic unit of sound , for instance ”cat” The word is made up of three syllables ,Cu-、Ah- And u-, The algorithm extracts these three phonemes , Then you string phonemes together to form separate words , Then you string the words together to form the dictation text of the audio clip .
Compared with the above pipeline , End to end deep learning is shown in the bottom line of the figure . One of its biggest challenges is that it requires a lot of data to make the system perform well .
Face recognition access control system .
The best approach so far seems to be a multi-step approach , First , You run a software to detect faces , So the first detector looks for the location of the face , Face detected , Then zoom in on that part of the image , And crop the image , Center the face , Then there are the red framed photos here , And feed it into the neural network , Let the network learn , Or estimate
The identity of that man .
Why two-step method is better :
1. Two problems to solve , Each is simple
2. There are a lot of training data for both subtasks
Machine translation
Watch your child's hands x light , Estimated age
Ten 、 Whether to use end-to-end deep learning (Whether to use end-to-end learning)
advantage :
1. Let data speak
2. Fewer manually designed components are required
shortcoming :
1. It may take a lot of data
2. Manual design components that may be useful are excluded
When applying end-to-end deep learning , Consider whether there is enough data to learn directly from x Mapping to y Functions that are complex enough .
Necessary complexity (complexity needed)
Driverless technology : Check around the car , Plan the route 、 Steering wheel accuracy 、 Precise throttle force
边栏推荐
- Gbase 8A OLAP analysis function cume_ Example of dist
- [MySQL practice] query statement demonstration
- 爬虫笔记(3)-selenium和requests
- 登录凭证(cookie+session和Token令牌)
- Matlab finds the position of a row or column in the matrix
- mysql 大于 小于 等于符号的表示方法
- Golang uses regularity to match substring functions
- VMware virtual machine PE startup
- Hash table - sum of arrays
- 结构化机器学习项目(一)- 机器学习策略
猜你喜欢
使用Fiddler模拟弱网测试(2G/3G)
Yarn中RMApp、RMAppAttempt、RMContainer和RMNode状态机及其状态转移
Codeforces Round #723 (Div. 2)
Summary of Web testing and app testing by bat testing experts
Fill in the blank of rich text test
Open source technology exchange - Introduction to Chengying, a one-stop fully automated operation and maintenance manager
结构化机器学习项目(一)- 机器学习策略
C language programming detailed version (learning note 1) I can't understand it after reading, and I can't help it.
Solve the problem that the virtual machine cannot be connected locally
Crontab scheduled task common commands
随机推荐
The karsonzhang/fastadmin addons provided by the system reports an error
信通院举办“业务与应用安全发展论坛” 天翼云安全能力再获认可
结构化机器学习项目(一)- 机器学习策略
记一次List对象遍历及float类型判断大小
Hash table - sum of arrays
Remote invocation of microservices
哈希表-数组之和
regular expression
Fill in the blank of rich text test
Record a list object traversal and determine the size of the float type
Day 7 of "learning to go concurrent programming in 7 days" go language concurrent programming atomic atomic actual operation includes ABA problem
C language programming detailed version (learning note 1) I can't understand it after reading, and I can't help it.
Software test automation test -- interface test from entry to proficiency, learn a little every day
Codeforces Round #719 (Div. 3)
结构化机器学习项目(二)- 机器学习策略(2)
MySQL greater than less than or equal to symbol representation
BAT测试专家对web测试和APP测试的总结
登录凭证(cookie+session和Token令牌)
使用sqlite3语句后出现省略号 ... 的解决方法
[MySQL practice] query statement demonstration