当前位置:网站首页>3D semantic segmentation - PVD
3D semantic segmentation - PVD
2022-07-25 16:29:00 【Lemon_ Yam】
PVD(
CVPR2022) Main contributions :
- Studied how to make Distillation of knowledge be applied to 3D Point cloud semantic segmentation In order to The model of compression
- Put forward
point-to-voxelThe distillation of knowledge , So as to cope with the sparse point cloud data (sparsity)、 Random (randomness) And varying density (varying density) The inherent properties of- Put forward Supervoxel (
supervoxel) Divide Method , sendaffinity distillationThe process is easy to operate- Put forward
difficulty-awareSampling strategy of , Make it contain a few kinds (minority classes) And distant objects (distant objects) Supervoxels of are easier to sample , thus Improve the distillation effect of these difficult objects (distillation efficacy)
PVD stay nuScenes and SemanticKITTI These two popular LiDAR Extensive experiments have been carried out on the segmentation benchmark , Its presence Cylinder3D、SPVNAS and MinkowskiNet These three representative backbones , Throughout It has a great advantage over the previous distillation method . It is worth noting that , In challenging nuScenes and SemanticKITTI On dataset , It can be used in competitive Cylinder3D The implementation on the model is about 75% Of MACs Reduce and 2 Double the speed , stay Waymo and SemanticKITTI(single-scan) In the challenge No. 1 , stay SemanticKITTI(multi-scan) In the challenge Rank third .
Network structure
In the figure below Cylinder3D As an example PVD Network structure , It contains Teachers' (teacher) And students (student) this 2 A network . among , Student network Number of channels per floor For Teachers Network Half , And the teacher network mainly from 5 Part of it is made up of , Namely Point feature extraction module ( point feature extraction module)、 Voxelization module (point-to-voxel transformation module)、 Codec module (encoder-decoder module)、DDCM modular 、 Point optimization module (point refinement module)

- The input point cloud Divided into a fixed number of supervoxels , And according to
difficulty-awareSampling strategy samplingKA supervoxel ( In the figureK=1, Marked by a red box ) - The supervoxels to be sampled Input point feature extraction module (
MLPs) in , obtainpointwiseOutput - Through the voxelization module
pointwiseOutput Voxelization - The voxelized data is input Codec module ( Use an asymmetric three-dimensional convolution network ) obtain
voxelwiseOutput - Voxels go through
DDCMModule to Capture high rank contextual features , Thus, it has enough ability to capture context information - Send contextual features into Point optimization module (
MLPs) Get inpointwiseOutput , So as to predict the relevant semantic information
️ In the framework of knowledge distillation , Student networks need to learn from teacher networks Two levels of knowledge , The first level is pointwise and voxelwise Output , The second level is inter-point and inter-voxel Of affinity matrix. About Cylinder3D Please refer to my previous blog for a brief introduction of 【3D Semantic segmentation ——Cylinder3D】
Point-to-Voxel Output Distillation
Compared with image data , The point cloud itself has sparsity , Which leads to It is difficult to train an effective student network through sparse supervision signals . Besides , Although point cloud data contains fine-grained environmental perception information , But because there is Thousands of dots , Lead to such knowledge Learning efficiency is very low . In order to improve learning efficiency , except pointwise Output , The paper suggests distillation (distil)voxelwise Output , because Fewer voxels and easier to learn .pointwise and voxelwise The distillation loss of is as follows :
L o u t p ( O S p , O T p ) = 1 N C ∑ n = 1 N ∑ c = 1 C K L ( O S p ( n , c ) ∣ ∣ O T p ( n , c ) ) L o u t v ( O S v , O T v ) = 1 R A H C ∑ r = 1 R ∑ a = 1 A ∑ h = 1 H ∑ c = 1 C K L ( O S v ( r , a , h , c ) ∣ ∣ O T v ( r , a , h , c ) ) \begin{aligned} L_{out}^p(O_S^p, O_T^p) &= \frac{1}{NC}\sum_{n=1}^N \sum_{c=1}^C KL(O_S^p(n, c)||O_T^p(n, c)) \\ L_{out}^v(O_S^v, O_T^v) &= \frac{1}{RAHC}\sum_{r=1}^R \sum_{a=1}^A \sum_{h=1}^H \sum_{c=1}^C KL(O_S^v(r, a, h, c) || O_T^v(r, a, h, c)) \end{aligned} Loutp(OSp,OTp)Loutv(OSv,OTv)=NC1n=1∑Nc=1∑CKL(OSp(n,c)∣∣OTp(n,c))=RAHC1r=1∑Ra=1∑Ah=1∑Hc=1∑CKL(OSv(r,a,h,c)∣∣OTv(r,a,h,c))
️ among , L o u t p L_{out}^p Loutp by pointwise Distillation loss , L o u t v L_{out}^v Loutv by voxelwise Distillation loss . N N N Is the number of points , C C C Is the number of categories , R R R Is the radius of voxel , A A A Is the voxel angle , H H H Is the voxel height , K L ( ⋅ ) KL(\cdot) KL(⋅) by Kullback-Leibler divergence loss
️ The same voxel may contain points from different categories , therefore , How to assign proper labels to voxels is also crucial to performance . Paper use Cylinder3D Medium Most coding strategies (majority encoding strategy), Use voxel The kind of label with the most points As a voxel label
Point-to-Voxel Affinity Distillation
Only right pointwise and voxelwise The output of knowledge distillation is not enough , Because it only considers the knowledge of each element , and Unable to capture the structural information of the surrounding environment . Because the input points are disordered , So this kind of Structural knowledge is very important for the semantic segmentation model based on lidar . A natural remedy is to use Relational knowledge distillation (relational knowledge distillation), It calculates the similarity of all point features , But the scheme exists The calculation cost is too high 、 It's hard to learn And Ignore the point ( Different categories 、 Different distances ) Differences between These shortcomings . therefore , The paper passed Supervoxel partition To reduce computing costs and improve learning efficiency , adopt difficulty-aware Sampling to correctly deal with the differences between different points .
Supervoxel partition : In order to learn relevant knowledge more efficiently , The paper will cover the whole point cloud Divided into multiple sizes R s × A s × H s R_s \times A_s \times H_s Rs×As×Hs Supervoxels . Every supervoxel It consists of a fixed number of voxels , And the total number of supervoxels is N s = ⌈ R R s ⌉ × ⌈ A A s ⌉ × ⌈ H H s ⌉ N_s = \lceil \frac{R}{R_s} \rceil \times \lceil \frac{A}{A_s} \rceil \times \lceil \frac{H}{H_s} \rceil Ns=⌈RsR⌉×⌈AsA⌉×⌈HsH⌉. In each distillation step , only sampling
KA supervoxel Conductaffinity distillationdifficulty-awaresampling : The sampling strategy is for Make supervoxels containing less frequent classes and distant objects easier to sample . The sampling strategy first determines the The weight , Again normalization The weight , Finally, the supervoxel is sampled probability . The relevant formula is as follows :
W i = 1 f c l a s s × d i R × 1 N s f c l a s s = 4 exp ( − 2 N m i n o r ) + 1 P i = W i ∑ i = 1 N s W i \begin{aligned} W_i &= \frac{1}{f_{class}} \times \frac{d_i}{R} \times \frac{1}{N_s} \\ f_{class} &= 4 \exp(-2N_{minor}) + 1 \\ P_i &= \frac{W_i}{\sum_{i=1}^{N_{s}}W_i} \end{aligned} WifclassPi=fclass1×Rdi×Ns1=4exp(−2Nminor)+1=∑i=1NsWiWi
️ among , W i W_i Wi For the first timeiA supervoxel The weight , f c l a s s f_{class} fclass For classes frequency (class frequency), P i P_i Pi For the first timeiSupervoxels were sampled probability , d i d_i di For the first timeiThe outer arc of a supervoxel (outer arc) ToXOYOf the origin of the face distance , N m i n o r N_{minor} Nminor Is in the supervoxel The number of a few voxels .
️ Papers will be available in the entire data set exceed1%The classes of points of are regarded as most classes , The rest are a few categories . A few voxels refer to their Class labels are a few classes , Voxel labels are based on Most coding strategies To make sure . When there are no few voxels , f c l a s s = 5 f_{class}=5 fclass=5; When a few voxels increase , f c l a s s f_{class} fclass Will quickly reduce , The minimum is1Feature handling : For point clouds , Type of input point Quantity and density are variable , thus As a result, the number of point features and voxel features of supervoxels also changes . In calculating the loss , Usually Keep the number of features fixed (
keep the number of features fixed). To solve this problem , If the number of point features of supervoxels is greater than N p N_p Np, Then random Remove redundant point features ( Labels are most classes ); If the number of point features is less than N p N_p Np, be Add all 0 Point features of . For voxel features ( The number of N v N_v Nv), A similar approach is also adopted . among , N p N_p Np and N v N_v Nv It's artificial .

After the above processing of features , In the r r r Among the supervoxels N p N_p Np Point features F ^ r p ∈ R N p × C f \hat{F}_r^p \in R^{N_p \times C_f} F^rp∈RNp×Cf and N v N_v Nv Individual element characteristics F ^ r v ∈ R N v × C f \hat{F}_r^v \in R^{N_v \times C_f} F^rv∈RNv×Cf. Besides , For each supervoxel , The paper calculates its inter-point affinity matrix:
C p ( i , j , r ) = F ^ r p ( i ) T F ^ r p ( j ) ∥ F ^ r p ( i ) T ∥ 2 ∥ F ^ r p ( j ) ∥ 2 , r ∈ { 1 , ⋯ , K } C^p(i, j, r) = \frac{\hat{F}_r^p(i)^T \hat{F}_r^p(j)}{\parallel \hat{F}_r^p(i)^T \parallel_2 \parallel \hat{F}_r^p(j) \parallel_2}, r \in \{1, \cdots, K \} Cp(i,j,r)=∥F^rp(i)T∥2∥F^rp(j)∥2F^rp(i)TF^rp(j),r∈{ 1,⋯,K}affinity score Get the similarity of each pair of point features , And the score can be regarded as a student network High level structural knowledge to learn (high-level structural knowledge). Besides ,inter-point affinity The distillation loss is calculated as follows :
L a f f p ( C S p , C T p ) = 1 K N p 2 ∑ r = 1 K ∑ i = 1 N p ∑ j = 1 N p ∥ C S p ( i , j , r ) − C T p ( i , j , r ) ∥ 2 2 L_{aff}^p (C_S^p, C_T^p) = \frac{1}{KN_p^2}\sum_{r=1}^K \sum_{i=1}^{N_p} \sum_{j=1}^{N_p}\parallel C_S^p(i, j, r) - C_T^p(i, j, r) \parallel_2^2 Laffp(CSp,CTp)=KNp21r=1∑Ki=1∑Npj=1∑Np∥CSp(i,j,r)−CTp(i,j,r)∥22inter-voxel The calculation and inter-point similar , The distillation loss is as follows :
L a f f v ( C S v , C T v ) = 1 K N v 2 ∑ r = 1 K ∑ i = 1 N v ∑ j = 1 N v ∥ C S v ( i , j , r ) − C T v ( i , j , r ) ∥ 2 2 L_{aff}^v (C_S^v, C_T^v) = \frac{1}{KN_v^2}\sum_{r=1}^K \sum_{i=1}^{N_v} \sum_{j=1}^{N_v}\parallel C_S^v(i, j, r) - C_T^v(i, j, r) \parallel_2^2 Laffv(CSv,CTv)=KNv21r=1∑Ki=1∑Nvj=1∑Nv∥CSv(i,j,r)−CTv(i,j,r)∥22
The total loss function
Of the network The total loss is borne by 7 Part of it is made up of , Namely pointwise and voxelwise Of weighted cross entropy Loss (1、2 term )、lovasz-softmax Loss ( The first 3 term )、point-to-voxel Distillation loss of ( after 4 term )
L = L w c e p + L w c e v + L l o v a s z + α 1 L o u t p ( O S p , O T p ) + α 2 L o u t v ( O S v , O T v ) + β 1 L a f f p ( C S p , C T p ) + β 2 L a f f v ( C S v , C T v ) \begin{aligned} L = &L_{wce}^p + L_{wce}^v + L_{lovasz}\\ &+\alpha_1 L_{out}^p(O_S^p, O_T^p) + \alpha_2 L_{out}^v(O_S^v, O_T^v) \\ &+\beta_1 L_{aff}^p(C_S^p, C_T^p) + \beta_2 L_{aff}^v (C_S^v, C_T^v) \end{aligned} L=Lwcep+Lwcev+Llovasz+α1Loutp(OSp,OTp)+α2Loutv(OSv,OTv)+β1Laffp(CSp,CTp)+β2Laffv(CSv,CTv)
️ among , α 1 \alpha_1 α1、 α 2 \alpha_2 α2、 β 1 \beta_1 β1、 β 2 \beta_2 β2 Used to level Balance the impact of distillation loss on the loss of main tasks
The paper :https://arxiv.org/pdf/2206.02099.pdf
Code :https://github.com/cardwing/Codes-for-PVKD
边栏推荐
- Permission management - role assignment menu
- Product dynamics - Android 13 high-efficiency adaptation new upgrade
- Save the image with gaussdb (for redis), and the recommended business can easily reduce the cost by 60%
- 80篇国产数据库实操文档汇总(含TiDB、达梦、openGauss等)
- 01.一个更简单的方法来传递大量的props
- 中国芯片自给率大幅提升,导致外国芯片库存高企而损失惨重,美国芯片可谓捧起石头砸自己的脚...
- 0x80131500打不开微软商店的解决办法
- EMQX Cloud 更新:日志分析增加更多参数,监控运维更省心
- Test Driven Development (TDD) online practice room | classes open on September 17
- Ilssi certification | the course of Six Sigma DMAIC
猜你喜欢

Various useful forms of London Silver K-line chart

Analysis and solution of data and clock mismatch delay in SPI transmission

3D 语义分割——Scribble-Supervised LiDAR Semantic Segmentation

leetcode:528. 按权重随机选择【普通随机失效 + 要用前缀和二分】
![[wechat applet] detailed explanation of applet host environment](/img/57/582c07f6e6443f9f139fb1af225ea4.png)
[wechat applet] detailed explanation of applet host environment

伦敦银K线图的各种有用形态

leetcode:154. 寻找旋转排序数组中的最小值 II【关于旋转排序数组的中后定位二分法】

0x80131500 solution for not opening Microsoft Store

Product upgrade observation station in June

【图像去噪】基于双立方插值和稀疏表示实现图像去噪matlab源码
随机推荐
Test framework unittest command line operation and assertion method
IaaS基础架构云 —— 云网络
Test framework unittest test test suite, results output to file
[xiao5 chat] check the official account < the service provided by the official account has failed, please wait a moment>
IAAs infrastructure cloud cloud network
Slf4j and log4j2 process logs
测试驱动开发(TDD)在线练功房 | 9月17日开课
【图像去噪】基于双立方插值和稀疏表示实现图像去噪matlab源码
doGet与doPost
Register service instances in ngmodule through dependency injection
Upgrade esxi6.7.0 to 7.0u3f (updated on July 12, 2022)
Test framework unittest skip test
进程之间的通信(管道详解)
复旦大学EMBA同学同行专题:始终将消费者的价值放在最重要的位置
152. 乘积最大子数组
C# 模拟抽奖
中国芯片自给率大幅提升,导致外国芯片库存高企而损失惨重,美国芯片可谓捧起石头砸自己的脚...
2D 语义分割——DeepLabV3plus 复现
What is a physical firewall? What's the effect?
Google Earth Engine——全球建筑物GlobalMLBuildingFootprints矢量集合下载