当前位置:网站首页>3D semantic segmentation - PVD
3D semantic segmentation - PVD
2022-07-25 16:29:00 【Lemon_ Yam】
PVD(
CVPR2022) Main contributions :
- Studied how to make Distillation of knowledge be applied to 3D Point cloud semantic segmentation In order to The model of compression
- Put forward
point-to-voxelThe distillation of knowledge , So as to cope with the sparse point cloud data (sparsity)、 Random (randomness) And varying density (varying density) The inherent properties of- Put forward Supervoxel (
supervoxel) Divide Method , sendaffinity distillationThe process is easy to operate- Put forward
difficulty-awareSampling strategy of , Make it contain a few kinds (minority classes) And distant objects (distant objects) Supervoxels of are easier to sample , thus Improve the distillation effect of these difficult objects (distillation efficacy)
PVD stay nuScenes and SemanticKITTI These two popular LiDAR Extensive experiments have been carried out on the segmentation benchmark , Its presence Cylinder3D、SPVNAS and MinkowskiNet These three representative backbones , Throughout It has a great advantage over the previous distillation method . It is worth noting that , In challenging nuScenes and SemanticKITTI On dataset , It can be used in competitive Cylinder3D The implementation on the model is about 75% Of MACs Reduce and 2 Double the speed , stay Waymo and SemanticKITTI(single-scan) In the challenge No. 1 , stay SemanticKITTI(multi-scan) In the challenge Rank third .
Network structure
In the figure below Cylinder3D As an example PVD Network structure , It contains Teachers' (teacher) And students (student) this 2 A network . among , Student network Number of channels per floor For Teachers Network Half , And the teacher network mainly from 5 Part of it is made up of , Namely Point feature extraction module ( point feature extraction module)、 Voxelization module (point-to-voxel transformation module)、 Codec module (encoder-decoder module)、DDCM modular 、 Point optimization module (point refinement module)

- The input point cloud Divided into a fixed number of supervoxels , And according to
difficulty-awareSampling strategy samplingKA supervoxel ( In the figureK=1, Marked by a red box ) - The supervoxels to be sampled Input point feature extraction module (
MLPs) in , obtainpointwiseOutput - Through the voxelization module
pointwiseOutput Voxelization - The voxelized data is input Codec module ( Use an asymmetric three-dimensional convolution network ) obtain
voxelwiseOutput - Voxels go through
DDCMModule to Capture high rank contextual features , Thus, it has enough ability to capture context information - Send contextual features into Point optimization module (
MLPs) Get inpointwiseOutput , So as to predict the relevant semantic information
️ In the framework of knowledge distillation , Student networks need to learn from teacher networks Two levels of knowledge , The first level is pointwise and voxelwise Output , The second level is inter-point and inter-voxel Of affinity matrix. About Cylinder3D Please refer to my previous blog for a brief introduction of 【3D Semantic segmentation ——Cylinder3D】
Point-to-Voxel Output Distillation
Compared with image data , The point cloud itself has sparsity , Which leads to It is difficult to train an effective student network through sparse supervision signals . Besides , Although point cloud data contains fine-grained environmental perception information , But because there is Thousands of dots , Lead to such knowledge Learning efficiency is very low . In order to improve learning efficiency , except pointwise Output , The paper suggests distillation (distil)voxelwise Output , because Fewer voxels and easier to learn .pointwise and voxelwise The distillation loss of is as follows :
L o u t p ( O S p , O T p ) = 1 N C ∑ n = 1 N ∑ c = 1 C K L ( O S p ( n , c ) ∣ ∣ O T p ( n , c ) ) L o u t v ( O S v , O T v ) = 1 R A H C ∑ r = 1 R ∑ a = 1 A ∑ h = 1 H ∑ c = 1 C K L ( O S v ( r , a , h , c ) ∣ ∣ O T v ( r , a , h , c ) ) \begin{aligned} L_{out}^p(O_S^p, O_T^p) &= \frac{1}{NC}\sum_{n=1}^N \sum_{c=1}^C KL(O_S^p(n, c)||O_T^p(n, c)) \\ L_{out}^v(O_S^v, O_T^v) &= \frac{1}{RAHC}\sum_{r=1}^R \sum_{a=1}^A \sum_{h=1}^H \sum_{c=1}^C KL(O_S^v(r, a, h, c) || O_T^v(r, a, h, c)) \end{aligned} Loutp(OSp,OTp)Loutv(OSv,OTv)=NC1n=1∑Nc=1∑CKL(OSp(n,c)∣∣OTp(n,c))=RAHC1r=1∑Ra=1∑Ah=1∑Hc=1∑CKL(OSv(r,a,h,c)∣∣OTv(r,a,h,c))
️ among , L o u t p L_{out}^p Loutp by pointwise Distillation loss , L o u t v L_{out}^v Loutv by voxelwise Distillation loss . N N N Is the number of points , C C C Is the number of categories , R R R Is the radius of voxel , A A A Is the voxel angle , H H H Is the voxel height , K L ( ⋅ ) KL(\cdot) KL(⋅) by Kullback-Leibler divergence loss
️ The same voxel may contain points from different categories , therefore , How to assign proper labels to voxels is also crucial to performance . Paper use Cylinder3D Medium Most coding strategies (majority encoding strategy), Use voxel The kind of label with the most points As a voxel label
Point-to-Voxel Affinity Distillation
Only right pointwise and voxelwise The output of knowledge distillation is not enough , Because it only considers the knowledge of each element , and Unable to capture the structural information of the surrounding environment . Because the input points are disordered , So this kind of Structural knowledge is very important for the semantic segmentation model based on lidar . A natural remedy is to use Relational knowledge distillation (relational knowledge distillation), It calculates the similarity of all point features , But the scheme exists The calculation cost is too high 、 It's hard to learn And Ignore the point ( Different categories 、 Different distances ) Differences between These shortcomings . therefore , The paper passed Supervoxel partition To reduce computing costs and improve learning efficiency , adopt difficulty-aware Sampling to correctly deal with the differences between different points .
Supervoxel partition : In order to learn relevant knowledge more efficiently , The paper will cover the whole point cloud Divided into multiple sizes R s × A s × H s R_s \times A_s \times H_s Rs×As×Hs Supervoxels . Every supervoxel It consists of a fixed number of voxels , And the total number of supervoxels is N s = ⌈ R R s ⌉ × ⌈ A A s ⌉ × ⌈ H H s ⌉ N_s = \lceil \frac{R}{R_s} \rceil \times \lceil \frac{A}{A_s} \rceil \times \lceil \frac{H}{H_s} \rceil Ns=⌈RsR⌉×⌈AsA⌉×⌈HsH⌉. In each distillation step , only sampling
KA supervoxel Conductaffinity distillationdifficulty-awaresampling : The sampling strategy is for Make supervoxels containing less frequent classes and distant objects easier to sample . The sampling strategy first determines the The weight , Again normalization The weight , Finally, the supervoxel is sampled probability . The relevant formula is as follows :
W i = 1 f c l a s s × d i R × 1 N s f c l a s s = 4 exp ( − 2 N m i n o r ) + 1 P i = W i ∑ i = 1 N s W i \begin{aligned} W_i &= \frac{1}{f_{class}} \times \frac{d_i}{R} \times \frac{1}{N_s} \\ f_{class} &= 4 \exp(-2N_{minor}) + 1 \\ P_i &= \frac{W_i}{\sum_{i=1}^{N_{s}}W_i} \end{aligned} WifclassPi=fclass1×Rdi×Ns1=4exp(−2Nminor)+1=∑i=1NsWiWi
️ among , W i W_i Wi For the first timeiA supervoxel The weight , f c l a s s f_{class} fclass For classes frequency (class frequency), P i P_i Pi For the first timeiSupervoxels were sampled probability , d i d_i di For the first timeiThe outer arc of a supervoxel (outer arc) ToXOYOf the origin of the face distance , N m i n o r N_{minor} Nminor Is in the supervoxel The number of a few voxels .
️ Papers will be available in the entire data set exceed1%The classes of points of are regarded as most classes , The rest are a few categories . A few voxels refer to their Class labels are a few classes , Voxel labels are based on Most coding strategies To make sure . When there are no few voxels , f c l a s s = 5 f_{class}=5 fclass=5; When a few voxels increase , f c l a s s f_{class} fclass Will quickly reduce , The minimum is1Feature handling : For point clouds , Type of input point Quantity and density are variable , thus As a result, the number of point features and voxel features of supervoxels also changes . In calculating the loss , Usually Keep the number of features fixed (
keep the number of features fixed). To solve this problem , If the number of point features of supervoxels is greater than N p N_p Np, Then random Remove redundant point features ( Labels are most classes ); If the number of point features is less than N p N_p Np, be Add all 0 Point features of . For voxel features ( The number of N v N_v Nv), A similar approach is also adopted . among , N p N_p Np and N v N_v Nv It's artificial .

After the above processing of features , In the r r r Among the supervoxels N p N_p Np Point features F ^ r p ∈ R N p × C f \hat{F}_r^p \in R^{N_p \times C_f} F^rp∈RNp×Cf and N v N_v Nv Individual element characteristics F ^ r v ∈ R N v × C f \hat{F}_r^v \in R^{N_v \times C_f} F^rv∈RNv×Cf. Besides , For each supervoxel , The paper calculates its inter-point affinity matrix:
C p ( i , j , r ) = F ^ r p ( i ) T F ^ r p ( j ) ∥ F ^ r p ( i ) T ∥ 2 ∥ F ^ r p ( j ) ∥ 2 , r ∈ { 1 , ⋯ , K } C^p(i, j, r) = \frac{\hat{F}_r^p(i)^T \hat{F}_r^p(j)}{\parallel \hat{F}_r^p(i)^T \parallel_2 \parallel \hat{F}_r^p(j) \parallel_2}, r \in \{1, \cdots, K \} Cp(i,j,r)=∥F^rp(i)T∥2∥F^rp(j)∥2F^rp(i)TF^rp(j),r∈{ 1,⋯,K}affinity score Get the similarity of each pair of point features , And the score can be regarded as a student network High level structural knowledge to learn (high-level structural knowledge). Besides ,inter-point affinity The distillation loss is calculated as follows :
L a f f p ( C S p , C T p ) = 1 K N p 2 ∑ r = 1 K ∑ i = 1 N p ∑ j = 1 N p ∥ C S p ( i , j , r ) − C T p ( i , j , r ) ∥ 2 2 L_{aff}^p (C_S^p, C_T^p) = \frac{1}{KN_p^2}\sum_{r=1}^K \sum_{i=1}^{N_p} \sum_{j=1}^{N_p}\parallel C_S^p(i, j, r) - C_T^p(i, j, r) \parallel_2^2 Laffp(CSp,CTp)=KNp21r=1∑Ki=1∑Npj=1∑Np∥CSp(i,j,r)−CTp(i,j,r)∥22inter-voxel The calculation and inter-point similar , The distillation loss is as follows :
L a f f v ( C S v , C T v ) = 1 K N v 2 ∑ r = 1 K ∑ i = 1 N v ∑ j = 1 N v ∥ C S v ( i , j , r ) − C T v ( i , j , r ) ∥ 2 2 L_{aff}^v (C_S^v, C_T^v) = \frac{1}{KN_v^2}\sum_{r=1}^K \sum_{i=1}^{N_v} \sum_{j=1}^{N_v}\parallel C_S^v(i, j, r) - C_T^v(i, j, r) \parallel_2^2 Laffv(CSv,CTv)=KNv21r=1∑Ki=1∑Nvj=1∑Nv∥CSv(i,j,r)−CTv(i,j,r)∥22
The total loss function
Of the network The total loss is borne by 7 Part of it is made up of , Namely pointwise and voxelwise Of weighted cross entropy Loss (1、2 term )、lovasz-softmax Loss ( The first 3 term )、point-to-voxel Distillation loss of ( after 4 term )
L = L w c e p + L w c e v + L l o v a s z + α 1 L o u t p ( O S p , O T p ) + α 2 L o u t v ( O S v , O T v ) + β 1 L a f f p ( C S p , C T p ) + β 2 L a f f v ( C S v , C T v ) \begin{aligned} L = &L_{wce}^p + L_{wce}^v + L_{lovasz}\\ &+\alpha_1 L_{out}^p(O_S^p, O_T^p) + \alpha_2 L_{out}^v(O_S^v, O_T^v) \\ &+\beta_1 L_{aff}^p(C_S^p, C_T^p) + \beta_2 L_{aff}^v (C_S^v, C_T^v) \end{aligned} L=Lwcep+Lwcev+Llovasz+α1Loutp(OSp,OTp)+α2Loutv(OSv,OTv)+β1Laffp(CSp,CTp)+β2Laffv(CSv,CTv)
️ among , α 1 \alpha_1 α1、 α 2 \alpha_2 α2、 β 1 \beta_1 β1、 β 2 \beta_2 β2 Used to level Balance the impact of distillation loss on the loss of main tasks
The paper :https://arxiv.org/pdf/2206.02099.pdf
Code :https://github.com/cardwing/Codes-for-PVKD
边栏推荐
- Product dynamics - Android 13 high-efficiency adaptation new upgrade
- QT ListView 列表显示组件笔记
- Product upgrade observation station in June
- Test framework unittest test test suite, results output to file
- Mqtt x cli officially released: powerful and easy-to-use mqtt 5.0 command line tool
- easyui入门
- Typescript learning 2 - Interface
- 【ZeloEngine】反射系统填坑小结
- 2W word detailed data Lake: concept, characteristics, architecture and cases
- What is the shortcut key for win11 Desktop Switching? Win11 fast desktop switching method
猜你喜欢

伦敦银K线图的各种有用形态

中国芯片自给率大幅提升,导致外国芯片库存高企而损失惨重,美国芯片可谓捧起石头砸自己的脚...

Mqtt x cli officially released: powerful and easy-to-use mqtt 5.0 command line tool

【图像去噪】基于双立方插值和稀疏表示实现图像去噪matlab源码

Communication between processes (pipeline details)

IAAs infrastructure cloud cloud network

easyui下拉框,增加以及商品的上架,下架

2D 语义分割——DeepLabV3plus 复现

easyui修改以及datagrid dialog form控件使用

如何构建面向海量数据、高实时要求的企业级OLAP数据引擎?
随机推荐
自定义mvc项目登录注册和树形菜单
IAAs infrastructure cloud cloud network
[image hiding] digital image watermarking method technology based on hybrid dwt-hd-svd with matlab code
吴恩达逻辑回归2
如何构建面向海量数据、高实时要求的企业级OLAP数据引擎?
Google Earth Engine——全球建筑物GlobalMLBuildingFootprints矢量集合下载
MyBaits
哪个led显示屏厂家更好
2W word detailed data Lake: concept, characteristics, architecture and cases
MySQL explicit lock
doGet与doPost
Use huggingface to quickly load pre training models and datasets in moment pool cloud
MySQL intent lock
【读书会第13期】+FFmpeg开源项目
用递归进行数组求和
测试框架-unittest-命令行操作、断言方法
什么是链游系统开发?链游系统开发如何制作
EMQX Cloud 更新:日志分析增加更多参数,监控运维更省心
Is the win11 dynamic tile gone? Method of restoring dynamic tile in Win 11
MySQL global lock