当前位置:网站首页>[target detection] tph-yolov5: UAV target detection based on Transformer's improved yolov5
[target detection] tph-yolov5: UAV target detection based on Transformer's improved yolov5
2022-07-25 16:47:00 【zstar-_】
brief introduction
Recently in use VisDrone Data set as target detection task , See this TPH-YOLOv5 This model is in VisDrone2021 testset-challenge The detection effect on the data set ranks fifth ,mAP achieve 39.18%.
So I began to read its paper , And run its code .
Address of thesis :https://arxiv.org/pdf/2108.11539.pdf
Project address :https://github.com/cv516Buaa/tph-yolov5
VisDrone Dataset Download :https://pan.baidu.com/s/1JzRTeSi_LgdUVhwtbWhA_w?pwd=8888
solve the problem
TPH-YOLOv5 It aims to solve two problems in UAV image :
- Because drones fly at different altitudes , The scale of the object changes violently
- High speed and low altitude flight bring motion blur to densely arranged objects .
Main improvements
TPH-YOLOv5 Is in YOLOv5 The following improvements have been made on the basis of :
- 1、 A new detection head is added to detect smaller scale objects
- 2、 use transformer prediction heads(TPH) Replace the original prediction header
- 3、 take CBAM Integrated into the YOLOv5 in , Help the network find areas of interest in the images covered by large areas .
- 4、 Other series of small tricks
New detection head

The new detection head is not difficult to understand , In my previous blog post 【 object detection 】YOLOv5 An improved model for small target detection / Add frame rate detection Also mentioned this improvement idea .
The overall structure of the improved network is as follows :
TPH
The author uses a Transformer Encoder Instead of some convolution sum CSP structure , take Transformer Application in vision , It is also the current mainstream trend ,Transformer It has a unique attention mechanism , The effect is better than before .

CBAM

CBAM(Convolutional Block Attention Module) It is a new design structure proposed by the author . As shown in the figure , A feature map is input to the next processing unit , Will first calculate its channel attention and spatial attention in parallel , Then it is fused and reshaped , This will make later processing units pay more attention to (focus on) Valuable target areas .
summary , This paper is written by Chinese , The structure and ideas of the paper are very consistent with the cognitive habits of Chinese people , It reads very smoothly .
actual combat
Next I'll use TPH-YOLOv5 Yes Visdron Data sets are trained . Because the code is based on YOLOv5 To modify , So I'm familiar with YOLOv5 Our readers can easily get through .
It is worth noting that , The author provides two model structures , The first is yolov5l-xs-tph.yaml This model structure , Not used CBAM, It's just YOLOv5 6.0 A new detection head is added to the version , I guess it's used in Ablation Experiment . If you need to run, the best effect , You should use yolov5l-xs-tr-cbam-spp-bifpn.yaml This model structure .
meanwhile , The author provides two pre training models , Then I will put it at the end of the article for readers to download .
I use Visdron Dataset training 100epoch after , Take an online video to detect , and YOLOv5 5.0,6.1 Compare the results of version , The effect is shown in the following video .
YOLOv5/TPH-YOLOv5 Test effect comparison test
B standing Link:https://www.bilibili.com/video/BV17a411u7JD( Go to B It's better to stand with one button for three times )
We can see that the actual effect is quite obvious ,TPH-YOLOv5 The recognition effect of dense crowds has been significantly improved .
I also share the test video :https://pan.baidu.com/s/1jgTonbDYmONkqvLjhLPpRQ?pwd=8888
The test effect of using other models can @ Let me be healthy for a while .
Test data is attached :
| Algorithm | [email protected] | [email protected]:.95s |
|---|---|---|
| yolov5-5.0 | 34.9% | 20.6% |
| yolov5-6.1 | 33.1% | 18.7% |
| tph-yolov5 | 37.4% | 21.7% |
notes : It's just 100 individual epoch What you get best.pt Test results , Not achieving optimal performance .
Code backup
Attached separately TPH-YOLOv5 Code local backup ( Include two pre training weights provided by the author ):https://pan.baidu.com/s/15mVle5Exghu3jJMFyl9Lyg?pwd=8888
边栏推荐
- unity 最好用热更方案卧龙 wolong
- [redis] redis installation
- 多租户软件开发架构
- The presentation logic of mail sending and receiving inbox outbox and reply to the problem of broken chain
- Sum arrays with recursion
- Baidu rich text editor ueeditor single image upload cross domain
- Test framework unittest test test suite, results output to file
- Emqx cloud update: more parameters are added to log analysis, which makes monitoring, operation and maintenance easier
- Use huggingface to quickly load pre training models and datasets in moment pool cloud
- 2D semantic segmentation -- deeplabv3plus reproduction
猜你喜欢

Test Driven Development (TDD) online practice room | classes open on September 17

Budget report ppt

Understanding service governance in distributed development

Emqx cloud update: more parameters are added to log analysis, which makes monitoring, operation and maintenance easier

博云容器云、DevOps平台斩获可信云“技术最佳实践奖”

城市燃气安全再拉警钟,如何防患于未“燃”?

简述redis集群的实现原理

【redis】redis安装

How does win11's own drawing software display the ruler?

MyBaits
随机推荐
2D semantic segmentation -- deeplabv3plus reproduction
Promise date
Rosen's QT journey 100 QML four standard dialog boxes (color, font, file, promotion)
Ilssi certification | the course of Six Sigma DMAIC
Fudan University EMBA peer topic: always put the value of consumers in the most important position
【obs】转载:OBS直播严重延迟和卡顿怎么办?
什么是链游系统开发?链游系统开发如何制作
What is the shortcut key for win11 Desktop Switching? Win11 fast desktop switching method
备考过程中,这些“谣言”千万不要信!
Roson的Qt之旅#99 QML表格控件-TableView
Test framework unittest test test suite, results output to file
ReBudget:通过运行时重新分配预算的方法,在基于市场的多核资源分配中权衡效率与公平性
Baidu rich text editor ueeditor image width 100% adaptive, mobile terminal
After 20 years of agitation, the chip production capacity has started from zero to surpass that of the United States, which is another great achievement made in China
win10自带的框选截图快捷键
Test framework unittest command line operation and assertion method
MySQL view
80篇国产数据库实操文档汇总(含TiDB、达梦、openGauss等)
C # simulation lottery
Enterprise live broadcast: witness focused products, praise and embrace ecology