当前位置:网站首页>1+1<2 ?! Interpretation of hesic papers

1+1<2 ?! Interpretation of hesic papers

2022-06-26 23:56:00 Shengsi mindspire

01 

Research background

HESIC It is mainly aimed at the joint compression of binocular images , Using the content correlation of binocular images , The main eye first encodes and decodes to guide the other eye to reduce the repeated encoding of redundant information in the encoding process , To optimize 1+1<2 The effect of .

The scientific research team is the xumai teacher group of Beijing University of Aeronautics and Astronautics , Around computer vision and image and video compression coding, etc low level Direction for scientific research .

 02 

A brief introduction to the main contents of the thesis

Binocular image joint compression , On the one hand, we need to optimize the image compression network , The other is the extraction and utilization of binocular mutual information , Only by combining the two organically can we give better play to 1+1<2 The effect of . and HESIC Network is a binocular end-to-end image compression algorithm based on deep learning , It can make full and effective use of the mutual information of binocular images to reduce the storage cost of each pair of pictures . Aiming at many characteristics of binocular images ,HESIC The network uses the homography image transformation of traditional image processing for reference to improve the coding efficiency of binocular images 、 Save storage bits , A basic network architecture based on self encoder is adopted . For the entropy coding part , The model based on Gaussian mixture distribution and the entropy coding model based on autoregression can adapt to two different entropy coding models with different advantages and disadvantages , And in InStereo2K and KITTI Better results on datasets .

 03 

Code link

Code link :

https://github.com/ywz978020607/HESIC

https://gitee.com/ywzsunny/HESIC-Mindspore-Migration

Thesis link :

https://openaccess.thecvf.com/content/CVPR2021/papers/Deng_Deep_Homography_for_Efficient_Stereo_Image_Compression_CVPR_2021_paper.pdf

 04 

Key points of algorithm framework technology

The main frame is as above , The basic encoding and decoding functions are realized through the respective encoding and decoding networks of the binocular , At the entrance and exit, the left eye is used as the main eye to encode and decode independently , The left eye is transformed into the right eye through homography to encode and decode the redundant information . Besides , After decoding, the homography transformation matrix , The left and right eye images can be Bi directionally transformed , The cross quality is enhanced by simple convolution after merging with the other channel , Further improve the model effect .

In the entropy model section ,HESIC The model based on Gaussian mixture distribution , Taking into account the parallel optimization speed, the prediction accuracy is improved . Besides , For different entropy models , We also use a method based on Joint Binocular entropy coding structure of autoregressive , Further enhance the effect , Write it down as HESIC+, Compared with HESIC, The disadvantage is that it is not conducive to parallel optimization , The advantage is to make better use of the encoded / Decoded message , Improve coding efficiency .

 05 

experimental result

The paper model is in Instereo2k and KITTI Experimental results of data sets or comparative experimental results , Include PSNR and SSIM The comparison of the two indicators under different compression ratios .

chart :HESIC stay Instereo2k and KITTI Objective effect after average on

chart  BD-BR Effect comparison

Subjective renderings

 06 

MindSpore Code implementation

https://gitee.com/ywzsunny/HESIC-Mindspore-Migration

The code is mainly divided into binocular image homography ( This part can be replaced by traditional feature matching , It has little effect on the results )、 Feature change 、 quantitative + Entropy model predicts bpp、 Feature reconstruction part .  The main structure of codec is still feature extraction and inverse transformation ,  The predicted codeword bits can be calculated directly in the derivation process of neural network through entropy model prediction ,  Without really serializing ,  So as to speed up the training process . On the one hand, the loss function of the training process includes the estimated bit rate ,  On the other hand, it includes image loss , Such as PSNR,  Pass both lambda weighting , Adjust compression ratio , Thus, model training and testing under different compression rates can be realized .

 07 

Summary and prospect

For binocular image compression , Better use of mutual information , And the compression efficiency can be further improved by the deep integration with the compression network . Looking forward to the future , The homography of binocular image and the relationship between the front and back frames of video have their own characteristics , The low-cost image content can be roughly registered according to homography transformation , And integrate it into other tasks .

原网站

版权声明
本文为[Shengsi mindspire]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206262330563840.html