当前位置:网站首页>Is binary cross entropy really suitable for multi label classification?
Is binary cross entropy really suitable for multi label classification?
2022-07-25 09:59:00 【Tobi_ Obito】
When I was in charge of a multi label text classification project, I used to classify multiple labels (multi-label classification) Of loss choice 、forward The tail layer processing is confused , At that time, I searched the information and determined a scheme :
1、 With Number of categories Output the number of nodes as the last hidden layer , With sigmoid Activate . This is actually equivalent to treating each category as 1 Two sub tasks , Finally, the output of the hidden layer corresponds to a category at each position . That's why , use sigmoid But can't use softmax(softmax Each node is not considered independent , contrary , Will lead to mutual influence ).
2、 use Binary Cross Entropy loss Training . If each node of the last hidden layer corresponds to a category 1/0 classification , that BCE loss It's really a natural choice . But is this really good ? The answer I give is not necessarily , And the thinner the label, the worse .
Introduction of key issues
Let's compare BCE And for multi classification tasks CrossEntropy The performance of the sample :( In order to direct , We don't stick our own formulas here , It's all over the floor , It is not helpful to understand this key point here , Or directly highlight the key with calculation examples )
Multi category tasks -CE
label:[0,1,0,0]
pred:[0.3, 0.67, 0.4, 0.25]
CE loss = 0 x log(0.3) + 1 x log(0.67) + 0 x log(0.4) + 0 x log(0.25)
Key distinguishing features : With CE As loss when , What works for model learning Only for label=1 Corresponding to position pred value . On the contrary ,0 Corresponding to position pred Value, big or small , about loss The calculation of No impact . Remember that , Look again. BCE.
Multi label classification task -BCE
label:[0,1,0,1]
pred:[0.3, 0.67, 0.4, 0.25]
BCE loss = 0 x log(0.3) + (1-0) x log(0.7)
+ 1 x log(0.67) + (1-1) x log(0.33)
+ 0 x log(0.4) + (1-0) x log(0.6)
+1 x log(0.25) + (1-1) x log(0.75)
Key distinguishing features : With BCE As loss when , Whether it's label=0 The position is still label=1 The position of both have an impact on model learning . This means that training will produce an effect :label A certain position of is 0 It will guide the model to output values to 0“ close ”.
It seems like a good thing , you 're right , It's really a good thing only for the classification of categories corresponding to this position , Therefore, the second classification task adopts BCE Absolutely nothing wrong . however , Now we are doing the task of multi label classification , Must consider this loss Caused by the Relevant global impact . Let's look at an example :
label:[0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Only 2 The positions are 1, That is to say, when the total number of labels is large , The label that this data conforms to is only 2 individual . This will lead to The dominant model learning is a large number of 0 instead of 1, And the characteristic information contained in the data text Only with 1 It's related ( Determine whether a piece of information belongs to a certain category , It is determined by what kind of characteristics it has , Not because it does not contain features of other categories ). As a result, there is a great amount of information ( Outstanding features ) Instead, I didn't learn anything helpful to classification in the text . If you still don't understand , Think about it this way : Teach children to know things , But every time just tell it is not a,b,c... As a result, it can only use exclusion method to identify each item , Obviously, the effect will be much worse .
summary
The key to the problem is Irrelevant information dominates model learning 1 forecast . If under this task , The number of real labels of most data is too large , So it's not a problem , But in most actual scenarios of multi label text classification tasks , The number of real tags of a piece of data is basically insignificant compared with the overall number of tags , This problem has a great impact on the training effect . therefore BCE For multi label classification is not a “ Safe to use ” Methods .
边栏推荐
- 深入理解pytorch分布式并行处理工具DDP——从工程实战中的bug说起
- 【机器翻译】SCONES——用多标签任务做机器翻译
- NLM5系列无线振弦传感采集仪的工作模式及休眠模式下状态
- 【成长必备】我为什么推荐你写博客?愿你多年以后成为你想成为的样子。
- TM1638 LED数码显示模块ARDUINO驱动代码
- Mlx90640 infrared thermal imager temperature measurement module development notes (V)
- ¥ 1-2 example 2.2 put the union of two sets into the linear table
- rospy Odometry天坑小计
- CDA Level1多选题精选
- Linked list -- basic operation
猜你喜欢

Data viewing and parameter modification of multi-channel vibrating wire, temperature and analog sensing signal acquisition instrument

pytorch使用tensorboard实现可视化总结

无线振弦采集仪参数配置工具的设置

Principle analysis of self supervised depth estimation of fish eye image and interpretation of omnidet core code

First knowledge of opencv4.x --- image histogram matching

CCF 201512-4 送货

SD/SDIO/EMMC

Exciting method and voltage of vibrating wire sensor by hand-held vibrating wire acquisition instrument
![[Android studio] batch data import to Android local database](/img/fc/758df0ba1c4c5b4f0eb8ccbcb4952c.png)
[Android studio] batch data import to Android local database

MLX90640 红外热成像传感器测温模块开发笔记(二)
随机推荐
数据分析面试记录1-5
Introducing MLOps 解读(一)
基于PackNet的演进——丰田研究院(TRI)深度估计文章盘点(下)
Linked list -- basic operation
Terminal definition and wiring of bsp3 power monitor (power monitor)
Mlx90640 infrared thermal imager temperature measurement module development notes (4)
T5论文总结
SOC芯片内部结构
Evolution based on packnet -- review of depth estimation articles of Toyota Research Institute (TRI) (Part 1)
Evolution based on packnet -- review of depth estimation articles of Toyota Research Institute (TRI) (Part 2)
Introduction to armv8 architecture
TM1637带秒点四位LED显示器模块ARDUINO驱动程序
手持振弦采集仪对振弦传感器激励方法和激励电压
概率论与数理统计 3 Discrete Random Variables and Probability Distributions(离散随机变量与概率分布) (下篇)
用Arduino写个ESP32看门狗
First knowledge of opencv4.x --- image histogram equalization
Arm preliminaries
VCS常用命令
[Android studio] batch data import to Android local database
一个可以返回前一页并自动刷新页面的ASP代码.