当前位置:网站首页>3D semantic segmentation - scribed supervised lidar semantic segmentation
3D semantic segmentation - scribed supervised lidar semantic segmentation
2022-07-25 16:29:00 【Lemon_ Yam】
The paper (
CVPR2022 Oral) Main contributions :
- Put forward the first graffiti mark (scribble-annotated) Lidar semantic segmentation data set ScribbleKITTI
- Put forward Self training of class range balance (class-range-balanced self-training) To deal with pseudo tag pairs that occupy the main number of classes and close dense areas Preference (bias) problem
- adopt Pyramid local semantic context descriptor (pyramid local semantic-context descriptor) To enhance the input point cloud , thus Improve the quality of fake labels
- By changing the
2and3Point and mean teacher frame combination , Proposed by the paper pipeline Available in use only8%Under the annotation point of95.7%Full supervision of (fully-supervised) performance
Dense annotation (densely annotating) LIDAR point cloud The cost is still too much , So we can't keep up with the growing amount of data . at present 3D The research work of semantic segmentation mainly focuses on Full supervision On the way , And the use of Weak supervision (weak supervision) To achieve effective 3D Semantic segmentation methods have not been explored . therefore , The paper proposes the use of graffiti (scribbles) Label the LIDAR point cloud , And released the first one for 3D Semantically segmented Graffiti tagging (scribble-annotated) Data sets ScribbleKITTI. But this also leads to those that contain edge information Not marked (unlabeled) Points are not used , And because Lack of a large number of annotation points ( This method only uses 8% Dimension point of ) The data of , Have affected Long tail distribution Class confidence of ( Less supervision ), Eventually making The performance of the model has decreased .
therefore , This paper proposes a method to reduce the use of this weak annotation (weak annotations) When the performance gap appears pipeline, The pipeline It consists of three independent parts , Sure With any LiDAR Semantic segmentation model , The code of the thesis is Cylinder3D Model , If yes Cylinder3D If you are interested, please refer to my previous article Blog . Its presence Use only 8% In the case of labeling , Accessible 95.7% Fully supervised performance .
ScribbleKITTI Data sets

Use graffiti to mark on 2D Semantic segmentation is a popular and effective method , But with 2D The images are different ,3D Point clouds retain metrics (metric) Space , Cause it Have a high geometric structure . To solve this problem , The paper suggests using a more geometric Straight line graffiti (line-scribble) To mark the LIDAR point cloud , And free-formed Graffiti compared , Straight line graffiti can Faster annotation of geometric classes that span large distances ( Such as : road , Sidewalk, etc ), And straight line graffiti only needs to know these points ( Some kind of point cloud ) Of Start and end Location . As shown in the figure above, cars ( Blue lines ) Shown , You only need to determine two points to complete the annotation . This will make it necessary to spend 1.5-4.5 The marking time of hours is reduced to 10-25 minute .
ScribbleKITTI The dataset is based on SemanticKITTI Of train-split part To mark . among ,SemanticKITTI Of train-split Section contains 10 individual sequences、19130 individual scans、2349 A million points ; and ScribbleKITTI Contains only 189 Millions of annotation points .
Like above Figure 3 Shown , The straight-line graffiti in the paper mainly refers to 2D The line of is projected to 3D The surface of the , This will cause the straight-line graffiti to become very blurred when the angle of view changes (indistinguishable).
Network structure

- Proposed by the paper
pipelineCan be divided intotraining、pseudo-labelinganddistillationThese three stages , These three stages are closely linked , Improve the quality of generated pseudo tags , So as to improve the accuracy of the model - stay training Stage , First, through PLS Come on Data enhancement , Retraining mean teacher, This is good for the back Generate higher quality pseudo tags
- stay pseudo-labeling Stage , adopt CRB To generate target tags , Reduce the quality of pseudo tags generated due to the point cloud's own attributes
- stay distillation Stage , Through the pseudo tag generated above Right again mean teacher Training
- mean teacher in L S L_S LS and L U L_U LU They correspond to each other It's marked Point and Not marked Points of their respective losses
Partial Consistency Loss with Mean Teacher
mean teacherFrame by 2 Part of it is made up of , The weights are θ \theta θ Of Student network And the weight is θ E M A \theta^{EMA} θEMA Of Teacher network . Usually , The weight of the student network passes gradient descent get , The weight of the teacher network is determined by Exponentially weighted average (exponential moving average) Students' weights are obtained , Its calculation formula is as follows :
θ t E M A = α θ t − 1 E M A + ( 1 − α ) θ t \theta_t^{EMA} = \alpha \theta_{t-1}^{EMA} + (1-\alpha)\theta_t θtEMA=αθt−1EMA+(1−α)θt
️ among , θ t \theta_t θt For the first time t t t In step Weight of student network , θ t E M A \theta_t^{EMA} θtEMA For the first time t t t In step Weight of teacher network , α \alpha α by Smoothing factor . By exponentially weighted averaging , can avoid Temporal Ensembling Limitations of the method , And you can get A more accurate model ( Compared with directly using the trained weight )
partial consistency lossOnly will consistency loss Use on unmarked points , This can be done by Reduce teacher network injection (injected) uncertainty Come on Marking point Carry out stricter supervision , At the same time, use more accurate teacher network output to Unmarked points To supervise . The loss function is as follows :
min θ ∑ f = 1 F ∑ i = 1 ∣ P f ∣ G i , f = { H ( y ^ f , i ∣ θ , y f , i ) , p f , i ∈ S l o g ( y ^ f , i ∣ θ ) y ^ f , i ∣ θ E M A , p f , i ∈ U \min_{\theta} \sum_{f=1}^{F} \sum_{i=1}^{|P_f|}G_{i, f} = \begin{cases} H(\hat{y}_{f, i}|_{\theta}, y_{f, i}), & p_{f, i} \in S \\ log(\hat{y}_{f, i}|_{\theta})\hat{y}_{f, i}|_{\theta^{EMA}}, & p_{f, i} \in U \end{cases} θminf=1∑Fi=1∑∣Pf∣Gi,f={ H(y^f,i∣θ,yf,i),log(y^f,i∣θ)y^f,i∣θEMA,pf,i∈Spf,i∈U
️ among , S S S There are marked points , U U U Are unmarked points , H H H Is the loss function ( Usually it is cross-entropy), F F F Is the number of point cloud frames , ∣ P f ∣ |P_f| ∣Pf∣ Is the number of points in a frame , y ^ f , i ∣ θ \hat{y}_{f, i}|_{\theta} y^f,i∣θ For the predicted value of the student network , y f , i y_{f, i} yf,i For real value , y ^ f , i ∣ θ E M A \hat{y}_{f, i}|_{\theta^{EMA}} y^f,i∣θEMA Predicted value for teacher network , p f , i p_{f, i} pf,i For the first time f Second frame i A little bit
Even though mean teacher Supervise the unmarked points , But because of the teacher network performance (performance) Influence , Its The information obtained is still limited . Even if the teacher network correctly predicts the label of a point , But because of Soft fake tags (soft pseudo-labeling) For its own reason , The confidence of other tags will still affect the output of student Networks .
Class-range-balanced Self-training (CRB-ST)
In response to the above Injected uncertainty And more directly use the confidence of unmarked point prediction , This paper extends the annotation data set and uses self-training. The paper passes self-training And mean teacher Let's introduce , The purpose is to keep mean teacher Soft pseudo tags for uncertain predictions To guide the (guidance), meanwhile strengthening False labels for some predictions . By using Teacher network Predicted in Maximum confidence That kind of , It can be an unmarked point Generate a set of target tags L L L.
Due to the nature of lidar Sensor , Local point The density varies with the beam radius , Sparsity increases with distance . This leads to false label owners Sample from dense areas , Its estimation confidence is often high . In order to reduce this problem in pseudo tag generation , This paper proposes a modified self-training Scheme and balance with class scope ( CRB) Use a combination of . Firstly, the horizontal plane is roughly divided into ego-vehicle Centered Width is B Of R Annulus , Each ring point contains points within a certain distance , From these points we can pseudo mark (pseudo-label) List each category Global highest confidence forecast . This ensures that we get reliable labels , At the same time, in different scopes and all classes Distribute them proportionally . The loss function is as follows :
min θ , y ^ ∑ f = 1 F ∑ i = 1 ∣ P f ∣ [ G i , f − ∑ c = 1 C ∑ r = 1 R F i , f , c , r ] F i , f , c , r = { ( l o g ( y ^ f , i ( c ) ∣ θ E M A ) + k ( c , r ) ) y ^ f , i ( c ) , r = ⌊ ∥ ( p x , y ) f , i ∥ / B ⌋ 0 , o t h e r w i s e \begin{aligned} &\min_{\theta, \hat{y}}\sum_{f=1}^{F}\sum_{i=1}^{|P_f|} [G_{i, f} - \sum_{c=1}^C \sum_{r=1}^R F_{i, f, c, r}] \\ & F_{i, f, c, r} = \begin{cases} (log(\hat{y}_{f, i}^{(c)}|_{\theta^{EMA}})+k^{(c, r)})\hat{y}_{f, i}^{(c)}, \quad &r = \lfloor \parallel(p_{x, y})_{f, i} \parallel/B \rfloor \\ 0, & otherwise\end{cases} \end{aligned} θ,y^minf=1∑Fi=1∑∣Pf∣[Gi,f−c=1∑Cr=1∑RFi,f,c,r]Fi,f,c,r={ (log(y^f,i(c)∣θEMA)+k(c,r))y^f,i(c),0,r=⌊∥(px,y)f,i∥/B⌋otherwise
️ among , k ( c , r ) k^{(c, r)} k(c,r) Is a ring like (class-annulus) Paired Negative logarithmic threshold (negative log-threshold), R R R Is the number of rings
In order to solve nonlinear integer optimization (nonlinear integer optimization) problem , The paper adopts the following solver :
y ^ f , i ( c ) ∗ = { 1 , i f c = a r g m a x y ^ f , i ∣ θ E M A , y ^ f , i ∣ θ > e x p ( − k ( c , r ) ) w i t h r = ⌊ ∥ ( p x , y ) f , i ∥ / B ⌋ 0 , o t h e r w i s e \hat{y}_{f, i}^{(c)*} = \begin{cases} 1, if &c= argmax\hat{y}_{f, i}|_{\theta^{EMA}},\\ &\hat{y}_{f, i}|_{\theta} \gt exp(-k^{(c, r)}) \\ &with \ r = \lfloor \parallel (p_{x, y})_{f, i} \parallel / B \rfloor\\ 0, &otherwise\end{cases} y^f,i(c)∗=⎩⎨⎧1,if0,c=argmaxy^f,i∣θEMA,y^f,i∣θ>exp(−k(c,r))with r=⌊∥(px,y)f,i∥/B⌋otherwise
Pyramid Local Semantic-context (PLS)
in order to Fake tags that guarantee higher quality , The paper further introduces a new descriptor (descriptor), It uses available graffiti (scribbles) To enrich (enrich) Characteristics of the initial point .
The paper finds that class tags are in 3D Space exists Spatial smoothing (spatial smoothness) Constraints and Semantic patterns (semantic pattern) constraint , Spatial smoothing constraint means that a point in space may be at least A point adjacent to it has the same class label , Semantic pattern constraints refer to domination Spatial relationships between classes A complex set of High level rules (high-level rules). therefore , The paper believes that Local semantic Apriori (local semantic prior) As a rich point descriptor (rich point descriptor) To encapsulate spatial smoothing constraints and semantic schema constraints , And put forward in Zoom resolution (scaling resolutions) Next use Local semantic context (semantic-context) To reduce Labeled points and unlabeled points (labeled-unlabeled point) Spread between Ambiguity of information And improve the quality of pseudo tags .
At first, the space is discretized into Rough voxels (coarse voxels), This step can be avoided Over description (over-descriptive) The characteristics of the network lead to graffiti marks Over fitting , To make it Generalization ability and the ability to understand meaningful geometric relationships are stronger . In order to meet the requirements of lidar sensors at different resolutions Inherent point distribution , We are Cylindrical coordinate system Used in Various sizes bins, For each bin b i b_i bi Calculate another rough Histogram , And then put these normalized Histogram Splice together ( Like above Figure 6 Shown ), Its calculation formula is as follows :
h i = [ h i ( 1 ) , ⋯ , h i ( C ) ] ∈ R C h i ( c ) = # { y j = c ∀ j ∣ p j ∈ b i } P L S = [ h i 1 / m a x ( h i 1 ) , ⋯ , h i s / m a x ( h i s ) ] ∈ R s C \begin{aligned} \pmb{h}_i &= [h_i^{(1)}, \cdots, h_i^{(C)}] \in R^C \\ h_i^{(c)} &= \#\{y_j = c \forall j | p_j \in b_i \} \\ \pmb{PLS} &= [\pmb{h}_i^1/max(\pmb{h}_i^1), \cdots, \pmb{h}_i^s/max(\pmb{h}_i^s)] \in R^{sC} \end{aligned} hhihi(c)PLSPLS=[hi(1),⋯,hi(C)]∈RC=#{ yj=c∀j∣pj∈bi}=[hhi1/max(hhi1),⋯,hhis/max(hhis)]∈RsC
️ among , h i \pmb{h}_i hhi It is the result of stitching all histograms together at a certain resolution , h i ( c ) h_i^{(c)} hi(c) Is a certain kind of histogram , P L S PLS PLS Is all under all resolutions normalized The result of histogram splicing , s s s Refers to resolution ?
️ take P L S PLS PLS Add to the input feature , And redefine the input LIDAR point cloud as Augment P a u g = { p ∣ p = ( x , y , z , I , P L S ) ∈ R s C } P_{aug} = \{p|p=(x, y, z, I, PLS) \in R^{sC}\} Paug={ p∣p=(x,y,z,I,PLS)∈RsC}, x , y , z x, y, z x,y,z Corresponding point 3D coordinate , and I I I by Reflection intensity . During the training , Use P a u g P_{aug} Paug Instead of the original data set, it can be found in pseudo-labeling Stage Generate higher quality pseudo tags .
The paper :https://arxiv.org/pdf/2203.08537.pdf
Code :https://github.com/ouenal/scribblekitti
Add : Deep learning ( seventy-four ) Semi supervision Mean teachers、 Semi supervised learning self-training
边栏推荐
猜你喜欢

论文笔记:Highly accurate protein structure prediction with AlphaFold (AlphaFold 2 & appendix)

Understanding service governance in distributed development

Baidu rich text editor ueeditor single image upload cross domain

使用 Terraform 在 AWS 上快速部署 MQTT 集群

MyBaits

Save the image with gaussdb (for redis), and the recommended business can easily reduce the cost by 60%

如何构建面向海量数据、高实时要求的企业级OLAP数据引擎?

MySQL之联表查询、常用函数、聚合函数

复旦大学EMBA2022毕业季丨毕业不忘初心 荣耀再上征程

IAAs infrastructure cloud cloud network
随机推荐
MYSQL导入sqllite表格的两种方法
首页门户分类查询
Food safety - do you really understand the ubiquitous frozen food?
自定义mvc项目登录注册和树形菜单
Emqx cloud update: more parameters are added to log analysis, which makes monitoring, operation and maintenance easier
如何安装govendor并打开项目
MySQL table read lock
Mqtt x cli officially released: powerful and easy-to-use mqtt 5.0 command line tool
Pagehelper.startpage is not effective
使用 Terraform 在 AWS 上快速部署 MQTT 集群
How to build an enterprise level OLAP data engine for massive data and high real-time requirements?
IaaS基础架构云 —— 云网络
linux内核源码分析之页表缓存
【ZeloEngine】反射系统填坑小结
优必选大型仿人服务机器人Walker X的核心技术突破
【小5聊】公众号排查<该公众号提供的服务出现故障,请稍后>
doGet与doPost
02. Limit the parameter props to a list of types
0x80131500打不开微软商店的解决办法
Win11自带画图软件怎么显示标尺?