当前位置:网站首页>[fundamentals of machine learning 01] blending, bagging and AdaBoost
[fundamentals of machine learning 01] blending, bagging and AdaBoost
2022-06-22 06:54:00 【chad_ lee】
Last year's study notes , Take out your recent review and sort it out .
Aggregation Model
Suppose there is T T T A model g 1 , ⋯ , g T g_{1}, \cdots, g_{T} g1,⋯,gT, Common aggregation methods are :
- Select the best from a validation set :
G ( x ) = g t ∗ ( x ) with t ∗ = argmin t ∈ { 1 , 2 , … , T } E val ( g t − ) G(\mathbf{x})=g_{t_{*}}(\mathbf{x}) \text { with } t_{*}=\operatorname{argmin}_{t \in\{1,2, \ldots, T\}} E_{\text {val }}\left(g_{t}^{-}\right) G(x)=gt∗(x) with t∗=argmint∈{ 1,2,…,T}Eval (gt−)
- Average all models :
G ( x ) = sign ( ∑ t = 1 T 1 ⋅ g t ( x ) ) G(\mathbf{x})=\operatorname{sign}\left(\sum_{t=1}^{T} 1 \cdot g_{t}(\mathbf{x})\right) G(x)=sign(t=1∑T1⋅gt(x))
- Weighted average all models :
G ( x ) = sign ( ∑ t = 1 T α t ⋅ g t ( x ) ) with α t ≥ 0 G(\mathbf{x})=\operatorname{sign}\left(\sum_{t=1}^{T} \alpha_{t} \cdot g_{t}(\mathbf{x})\right) \text { with } \alpha_{t} \geq 0 G(x)=sign(t=1∑Tαt⋅gt(x)) with αt≥0
- Determine according to the current status :
G ( x ) = sign ( ∑ t = 1 T q t ( x ) ⋅ g t ( x ) ) with q t ( x ) ≥ 0 G(\mathbf{x})=\operatorname{sign}\left(\sum_{t=1}^{T} q_{t}(\mathbf{x}) \cdot g_{t}(\mathbf{x})\right) \text { with } q_{t}(\mathbf{x}) \geq 0 G(x)=sign(t=1∑Tqt(x)⋅gt(x)) with qt(x)≥0
These are all intuitive ideas , Now let's go further .
Blending
Mean fusion (Uniform Blending)
In the classification problem :
G ( x ) = sign ( ∑ t = 1 T g t ( x ) ) G(\mathbf{x})=\operatorname{sign}\left(\sum_{t=1}^{T} g_{t}(\mathbf{x})\right) G(x)=sign(t=1∑Tgt(x))
The minority obeys the majority in multi classification tasks :

In regression, the mean value of all model outputs is taken .
When g t g_{t} gt Similar predicted values , Then the performance remains the same . When g t g_{t} gt Diverse democracy , Some classification results g t ( x ) > f ( x ) g_{t}(\mathbf{x})>f(\mathbf{x}) gt(x)>f(x), Other classification results $ g_{t}(\mathbf{x})<f(\mathbf{x})$, Then the best solution can be obtained by averaging the ideal state . Combine the above two requirements , Diverse hypotheses It is easier to make the fusion model perform better .
It can be deduced theoretically , Mean value fusion can reduce the error :
avg ( E out ( g t ) ) = avg ( E ( g t − G ) 2 ) + E out ( G ) \operatorname{avg}\left(E_{\text {out }}\left(g_{t}\right)\right)=\operatorname{avg}\left(\mathcal{E}\left(g_{t}-G\right)^{2}\right)+E_{\text {out }}(G) avg(Eout (gt))=avg(E(gt−G)2)+Eout (G)
From the above equation we can see that unless all g t g_t gt equal , Otherwise, the fused model G G G The error of is smaller than the average error of all sub models ( There is a variance difference ).
Linear fusion (Linear Blending)
Classification problem :
G ( x ) = sign ( ∑ t = 1 T α t ⋅ g t ( x ) ) with α t ≥ 0 G(\mathbf{x})=\operatorname{sign}\left(\sum_{t=1}^{T} \alpha_{t} \cdot g_{t}(\mathbf{x})\right) \text { with } \alpha_{t} \geq 0 G(x)=sign(t=1∑Tαt⋅gt(x)) with αt≥0
The return question :
min α t ≥ 0 1 N ∑ n = 1 N ( y n − ∑ t = 1 T α t g t ( x n ) ) 2 \min _{\alpha_{t} \geq 0} \frac{1}{N} \sum_{n=1}^{N}\left(y_{n}-\sum_{t=1}^{T} \alpha_{t} g_{t}\left(\mathbf{x}_{n}\right)\right)^{2} αt≥0minN1n=1∑N(yn−t=1∑Tαtgt(xn))2
Then the optimization problem can be written as :
min α t ≥ 0 1 N ∑ n = 1 N err ( y n , ∑ t = 1 T α t g t ( x n ) ) \min _{\alpha t \geq 0} \frac{1}{N} \sum_{n=1}^{N} \operatorname{err}\left(y_{n}, \sum_{t=1}^{T} \alpha_{t} g_{t}\left(\mathbf{x}_{n}\right)\right) αt≥0minN1n=1∑Nerr(yn,t=1∑Tαtgt(xn))
Use the training set to get g t g_t gt, But it's best to use a validation set to get α t \alpha_t αt.
Stacking

As shown in the figure above , The first part is to use N N N Basic models such as xgb1、lgb1 Conduct n n n Crossover verification , Then you get a N ∗ n N *n N∗n Characteristic matrix of , This characteristic matrix is used as the input of the second layer model . The original data set shall also pass the test N N N Base models are mapped to N N N Dimensional space .

Bagging
Blending Is to learn something different g t g_t gt And then put them together , Can we learn from it g t g_t gt Edge aggregation .
Gain diversity g t g_t gt The way to do this is :
- diversity by different models
- diversity by different parameters: For example, optimization methods GD The step size of varies
- diversity by algorithmic randomness: For example, the model algorithm has its own randomness
- diversity by data randomness
Bootstrap Aggregation
Starting from data , If you can get a lot of diverse data sets D t D_t Dt To train a g t g_t gt Just fine . Data resampling can be used to obtain simulated D t D_t Dt:
- The original size is N N N Data set of D \mathcal{D} D On , There is a sample put back N ′ N^{\prime} N′ Get the simulation data set for the second time D t → \mathcal{D}_{t} \rightarrow Dt→ This step is Bootstrap operation
- adopt A ( D ~ t ) \mathcal{A}\left(\tilde{\mathcal{D}}_{t}\right) A(D~t) obtain g t g_{t} gt, Then use the mean value fusion to obtain : G = Uniform ( { g t } ) G=\operatorname{Uniform}\left(\left\{g_{t}\right\}\right) G=Uniform({ gt}) .
Boostrap The premise of reasonable method is : Diversity of data sets and base algorithms A \mathcal A A Sensitive to random data .
Adaptive Boosting (AdaBoost ) Actually from Bagging At the heart of bootstrap A fusion algorithm based on , Here is a detailed introduction .
AdaBoost
AdaBoost The core idea of is that the data of all current model misclassifications will be strengthened , Train the next weak classifier . Finally all weak classifiers vote . A classic diagram :
Simple diagram of algorithm flow :


Algorithm flow
To sum up the problem of binary classification :
Enter as sample set D = { ( x , y 1 ) , ( x 2 , y 2 ) , … ( x n , y n ) } D=\left\{\left(x, y_{1}\right),\left(x_{2}, y_{2}\right), \ldots\left(x_{n}, y_{n}\right)\right\} D={ (x,y1),(x2,y2),…(xn,yn)},, Weak learner algorithm , Number of weak learner iterations T T T.
The output is the final strong learner G ( x ) G(x) G(x).
- The initialization sample set weight is
u ( 1 ) = [ 1 N , 1 N , ⋯ , 1 N ] \mathbf{u}^{(1)}=\left[\frac{1}{N}, \frac{1}{N}, \cdots, \frac{1}{N}\right] u(1)=[N1,N1,⋯,N1]
about t = 1 , 2 , . . . , T t = 1, 2,...,T t=1,2,...,T:
(a) Use with weights u ( t ) \mathbf{u}^{(t)} u(t) To train the data , Get the weak learner g t ( x ) g_t(x) gt(x)
(b) Calculation g t ( x ) g_t(x) gt(x) Classification error rate
ϵ t = ∑ n = 1 N u n ( t ) [ y n ≠ g t ( x n ) ] ∑ n = 1 N u n ( t ) \epsilon_{t}=\frac{\sum_{n=1}^{N} u_{n}^{(t)} [ y_{n} \neq g_{t}\left(\mathbf{x}_{n}\right) ]}{\sum_{n=1}^{N} u_{n}^{(t)}} ϵt=∑n=1Nun(t)∑n=1Nun(t)[yn=gt(xn)]
Update the weight distribution of the sample set : Samples of classification pairs , The weight u n ( t + 1 ) ← u n ( t ) ⋅ 1 − ϵ t ϵ t u_{n}^{(t+1)} \leftarrow u_{n}^{(t)} \cdot \sqrt{\frac{1-\epsilon_{t}}{\epsilon_{t}}} un(t+1)←un(t)⋅ϵt1−ϵt; Misclassified samples , The weight u n ( t + 1 ) ← u n ( t ) / 1 − ϵ t ϵ t u_{n}^{(t+1)} \leftarrow u_{n}^{(t)} / \sqrt{\frac{1-\epsilon_{t}}{\epsilon_{t}}} un(t+1)←un(t)/ϵt1−ϵt(d) Calculate the coefficients of weak classifiers :
α t = ln ( 1 − ϵ t ϵ t ) \alpha_{t}=\ln \left( \sqrt{\frac{1-\epsilon_{t}}{\epsilon_{t}}} \right) αt=ln(ϵt1−ϵt)The final classifier is :
G ( x ) = sign ( ∑ t = 1 T α t g t ( x ) ) G(\mathbf{x})=\operatorname{sign}\left(\sum_{t=1}^{T} \alpha_{t} g_{t}(\mathbf{x})\right) G(x)=sign(t=1∑Tαtgt(x))
The weakest classifier here can be a decision stump( Single level decision tree ), There are only three parameters : Select a feature i i i, Set a threshold θ \theta θ, Determine what kind of threshold is s s s.
边栏推荐
- 实训渗透靶场02|3星vh-lll靶机|vulnhub靶场Node1
- Linux link sqlserver, offline installation
- In depth analysis of 20million OP events stolen by optimization (including code)
- 圣杯布局和双飞翼布局的区别
- Py's optbinning: introduction, installation and detailed introduction of optbinning
- Databricks from open source to commercialization
- Armadillo installation
- Event preview edgex developer summit @ Nanjing station is coming!
- Introduction notes to quantum computing (continuously updated)
- OpenGL - Textures
猜你喜欢

5G终端标识SUPI,SUCI及IMSI解析

深度解析Optimism被盗2000万个OP事件(含代码)

6. 安装ssh连接工具(用于我们连接实验室的服务器)

Xh_ CMS penetration test documentation

golang調用sdl2,播放pcm音頻,報錯signal arrived during external code execution。

Error when connecting MySQL with dbeaver for the first time

Record of problems caused by WPS document directory update

Golang calls sdl2, plays PCM audio, and reports an error signal arrived during external code execution.

【OpenAirInterface5g】高层模块接口及itti实体线程创建

Languo technology helps the ecological prosperity of openharmony
随机推荐
Databricks from open source to commercialization
【5G NR】NAS连接管理—CM状态
流程引擎解决复杂的业务问题
Install boost
Don't throw away the electric kettle. It's easy to fix!
Neuron+eKuiper 实现工业物联网数据采集、清理与反控
QT connect to Alibaba cloud using mqtt protocol
Cactus Song - online operation (5)
Introduction to 51 single chip microcomputer - matrix key
C技能树评测——用户至上做精品
Cactus Song - online operation (4)
Languo technology helps the ecological prosperity of openharmony
Vector of relevant knowledge of STL Standard Template Library
Introduction to 51 single chip microcomputer - LED light
【5G NR】UE注册管理状态
Data security practice guide - data collection security management
OpenGL - Draw Triangle
Why did I choose rust
Implement a timer: timer
Reprint the Alibaba open source project egg JS technical documents cause "copyright disputes". How to use the loose MIT license?