当前位置:网站首页>Introduction to anchor free decision
Introduction to anchor free decision
2022-06-26 00:09:00 【Invincible Zhang Dadao】
1. background
Target detection starts from two_stage Time To one_stage Time , from anchor basic To anchor free, More and more refined . from 18 year CornerNet Start ,anchor free Paper jet neck explosion , Announce the beginning of anchor free Time .
2. Network
2.1 DenseBox
The work of this paper :
- Proved that simple FCN As long as the network is reasonably designed, it can be used to detect targets under different scales and severe occlusion .
- Propose new FCN Model ,DENSEBOX, No regional proposal is required , It can be used to train end-to-end network .
- Combined with the landmark localization Multi task learning makes densebox The accuracy is further improved .

Network architecture : With vggNet by backbone The Internet ,input picture(m * n * 3) -> conv ->bilinear upsample -> threshold and NMS -> output(m/4 * n/4 * 5),
1) First, design a set of End to end multitask full convolution model , Directly regress the confidence degree of the appearance of the object and its relative position .
2) At the same time, in order to better deal with objects with serious occlusion , Improve the recall rate of small objects , He introduced... Into the detection network Upper sampling layer , and Integrating shallow networks The resulting features , Get a larger output layer .
3) To screen training samples , Ensure that the positive and negative samples are balanced , Reduce false detection , He also took the lead in using Online Hard Negative Mining The strategy of , And difficult case analysis .
4) Each pixel is converted to a confidence level and to the target bounding box bbox Four distances , Then proceed NMS.
stay FCN Adding a few layers to the structure can achieve landmark localization, And then through fusion landmark heatmaps and score map It can further improve the test results .
2.2 YOLOV1
YOLOV1 As anchor free A masterpiece of (YOLOV2 And V3 All with anchor Frame network architecture )
- Input the image as 448x448x3 Color image of , after GoogLeNet Before 20 Layer for convolution output 14x14x1024 Characteristic graph
- Then it passes through four convolution layers and 2 All connection layers , Finally, it is reordered into a 7x7x30 Matrix ( tensor )

As shown in the figure above , After layers of convolution , Output 7730 The matrix of is equivalent to dividing the original image into 7x7=49 Boxes , Each frame consists of a 30 The vector of dimensions constitutes 30 Dimension vector , front 10 The two features predict two bbox(BoundingBox Regression box ), A horizontal box , A column box , Each box 5 Features . Next to the box is the category , common 20 Classes . The features are in the two boxes of probability prediction that each target belongs to a certain class , Of each box 5 Features , Namely :(1)bbox center x be relative to grid cell( The small red box in the figure ) Coordinates of (2)bbox center y be relative to grid cell Of y coordinate (3)bbox The width of (4)bbox The height of (5) Is there a goal , The goal of existence is 1xIoU Value , otherwise 0xIoU Value
Predicted x,y,w,h The value range of is all delimited to (0,1) In the open section , The conversion method is shown in the following figure
On the division of 7*7 Of 49 A cell , There will always be a 30 Dimension vector
Each cell is responsible for a single target , If the centers of two or more targets exist in a cell at the same time , Then only the category with the highest probability is saved in the cell
- LOSS Calculation

2.3 CornerNet
- Overall network architecture :backbone by hourglass The Internet , Then add two prediction modules .

Simplify :
1.1 hourglass The Internet
The principle is similar to resNet The Internet , And sampling through convolution in the early stage , Fuse with the value of subsequent upsampling , Obtain characteristic maps of different scales ,. For subsequent pooling.
1.2 corner pooling
Two feature map, Take the same position , To the first feature map Take this column as the pixel point at this position on the max pooling; For the second feature map Pixels at this position on the , The maximum value of the right row starting from it (max pooling+1), Add the two maximum values , This is the output of this position . Do this for all locations , Get a complete output, This is a complete top-left corner pooling. Empathy ,bottom-right corner pooling Is to look up and take the maximum value , Look left to get the maximum value , And then add up .
1.3 Prediction module
The output of each prediction module is divided into Heatmaps,Embedding, and offsets Three parts , Their respective function is to point out the position of the corner , Corner pairing , Deviation correction .
heatmap: Yes C individual chanel,C Is the number of target categories . No background chanel. Every chanel Are binary masks , Used to indicate the position of the corner , Yes , It is our ultimate goal to find a point .
Embedding: For corner pairing . You have a pile of top-left corners, Another pile bottom-right corners, Then where do you know who should be a couple with whom . Here is the human posture estimation , The idea of pairing joint points , Assign one for each corner Embedding, Just think of it as an identity card . The color of each object's ID card is different , Those who get the identity cards of the same color are the whole family . Here is the embedding The closest value top-left corner and bottom-right corner Make a pair to draw a frame .
offset: The offset . Why calculate this thing . In the author's experiment , Input is 511∗511( It seems that I remember ), however heatmap yes 128∗128. Enter the point on the (x,y)(x,y)(x,y) Insinuate to heatmap On , It has to be ([x∗128/511],[y∗128/511]), Don't worry about the result calculated by others , When you see the rounding symbol, you know that you have to lose precision , And then heatmap When the position found on is mapped back , That must be wrong , So there was offset(128∗128∗2,x,y1281282,x,y128∗128∗2,x,y Offset in both directions ).
The specific operation is : First pair heatmap Non maximum suppression , And then take top 100 Of top-left and top 100 Of bottom-right The corner of , And then use offset Correct the position of these corners . And then calculate top-left and bottom-right Corner point Embedding Of L1 distance , Distance greater than 0.5 Or there are different kinds of corners that do not deserve to walk into the palace of marriage hand in hand . Those who can walk into the palace of marriage will get married , This pair can be used to draw a frame .
1.4 loss function
Go on ===================================
2.4 FSAF
2.5 FCOS
2.6 FoveaBox
边栏推荐
猜你喜欢

Explain in detail the three types of local variables, global variables and static variables

DHCP review

Building cloud computers with FRP
![Bit Compressor [蓝桥杯题目训练]](/img/d5/231d20bf4104cc2619b2a4f19b605c.png)
Bit Compressor [蓝桥杯题目训练]

关于运行scrapy项目时提示 ModuleNotFoundError: No module named 'pymongo‘的解决方案

Redis之内存淘汰机制

正则表达式介绍及一些语法

ValueError: color kwarg must have one color per data set. 9 data sets and 1 colors were provided解决

手工制作 pl-2303hx 的USB转TTL电平串口的电路_过路老熊_新浪博客

猕猴桃酵素的功效_过路老熊_新浪博客
随机推荐
如何绕过SSL验证
Thrift入门学习
Connecting MySQL database with VBScript_ Old bear passing by_ Sina blog
14.1.1、Promethues监控,四种数据类型metrics,Pushgateway
解决线程并发安全问题
Bit compressor [Blue Bridge Cup training]
关于运行scrapy项目时提示 ModuleNotFoundError: No module named 'pymongo‘的解决方案
社交网络可视化第三方库igraph的安装
Some common operation methods of array
Smt贴片机保养与维护要点
POSTMAN测试出现SSL无响应
huibian
正则表达式介绍及一些语法
redis之详解
Read CSV file data in tensorflow
文献调研(一):基于集成学习和能耗模式分类的办公楼小时能耗预测
yolov5 提速多GPU训练显存低的问题
《网络是怎么样连接的》读书笔记 - 集线器、路由器和路由器(三)
网络协议之:redis protocol详解
Common methods of object class