当前位置:网站首页>Excess rlsp
Excess rlsp
2022-06-21 17:49:00 【Ton10】

This article yes 2019 Year of ICCVW, It only pursues speed for the real-time performance of video , Give up expressiveness . The author proposes an efficient VSR Model ——Recurrent Latent Space Propagation(RLSP), It is a typical No alignment Method , So compared with the classic VSR Those are based on flow perhaps DCN For the model of , Its opposite Efficient .RLSP Will VSR Modeling as RNN Model , Its core is Shuffling and Hidden-state.
Reference documents :
① Source code
② Video super score :RLSP(Efficient Video Super-Resolution through Recurrent Latent Space Propagation)
Efficient Video Super-Resolution through Recurrent Latent Space Propagation
Abstract
RLSP Is an unaligned VSR Method : The biggest benefit of no alignment is efficiency , Fast ; A defect is a deficiency ——PSNR Relative to aligned VSR The model will be worse .
The author gives 3 There are three reasons to explain Introduction RLSP The necessity of :
- The real time . Either explicit or implicit motion compensation , Will occupy a certain amount of computing resources and video memory requirements , Therefore, canceling the alignment module can speed up VSR The speed of reconstruction and certain savings GPU-Memory.
- FLow-based Alignment method It is highly dependent on the accuracy of motion estimation . Once the motion estimation is not accurate, it will introduce artifacts; Whether it's Flow-based still Flow-free All alignments of the are interpolated , The inevitable loss of high frequency details .
- When data set When the range of motion is small , Adjacent frames are very close , No alignment has little effect .
Therefore, the author puts forward a method based on RNN Of VSR Model ——RLSP, It models video hypersegmentation as a sequence problem (Sequence-to-Sequence), It has the following characteristics :
- No alignment , Fast , Real time , It is better than the DUF Be close 70 times !
- RLSP The core of is to use a high-dimensional ( C = 128 C=128 C=128) The hidden state of h h h To spread the characteristic information of the past ; And use based on ESPCN Proposed
PixelShuffleTo complete Up sampled Shuffle_up and Feedback Of Shuffle_down.
1. Introduction
Complex motion compensation often requires expensive computing resources , Therefore, such a model does not have real-time occasions , For example, the game is super divided into fields .
RLSP It is designed for real-time requirements VSR Model . differ VESPCN、TDAN、Robust-LTD、EDVR These uses sliding-windows The characteristic transmission mode of ,RLSP Based on cyclic network structure , Belonging to one-way characteristic propagation VSR Method , The concrete propagation is through the hidden state (Latent-State) To do the .
The picture below is RLSP、FRVSR、DUF stay PSNR-Runtime The results of the experiment on :
- As can be seen from the above figure RLSP Than FRVSR and DUF Separate quickly 10 Times and 70 About times .
- “
7-128” Means to use after fusion 7 Layer of CNN The Internet , Use of each layer 128 A filter , therefore RLSP The performance of can be achieved by increasing the complexity of the network .
2. Related Work
A little
3. Method
For each iteration ,RLSP The goal of is to put the current frame x t ∈ R H × W × C x_t\in\mathbb{R}^{H\times W\times C} xt∈RH×W×C Over score to y t ∈ R r H × r W × C y_t\in\mathbb{R}^{rH\times rW\times C} yt∈RrH×rW×C. About RLSP Of pipeline As shown in the figure below , because RLSP be based on RNN structure , So the most important part is the purple box in the figure below ——RLSP-Cell:
Next, let's roughly describe RLSP Of pipeline( Suppose that each batch The number of frames for is 10, Input is RGB Images 64 × 64 64\times 64 64×64, The super resolution is r = 4 r=4 r=4, Number of filters f = 128 f=128 f=128):
- RLSP learn sliding-windows Experience : Place the front and rear adjacent to each other 1 Channel fusion between frame and current frame , The difference is that the alignment process is eliminated directly , The reason why we dare to do this is to assume that the similarity of adjacent frames is high . therefore cell One of the inputs to is R b × 3 × 3 × 64 × 64 \mathbb{R}^{b\times 3\times 3\times 64\times64} Rb×3×3×64×64.
- Cell The second input of the is the result of the super division from the previous frame y t − 1 y_{t-1} yt−1—— R b × 3 × 256 × 256 \mathbb{R}^{b\times 3\times 256\times 256} Rb×3×256×256 adopt
shuffle_downLater results —— R b × ( 3 ∗ 4 ∗ 4 ) × 64 × 64 \mathbb{R}^{b\times (3*4*4)\times 64\times 64} Rb×(3∗4∗4)×64×64. - Cell The third input to is the last hidden state h t − 1 h_{t-1} ht−1—— R b × 128 × 64 × 64 \mathbb{R}^{b\times 128\times 64\times 64} Rb×128×64×64. Same as RNN Like the cells in the structure , The hidden state is also predicted through some full connection layer or convolution layer . Because it is a recursive loop process , So we directly analyze h t h_t ht The birth of . As shown in the purple box above ,cell Altogether n = 7 n=7 n=7 layer , The first layer is to integrate the three parties after the channel fusion ( 3 ∗ 3 + f + 3 ∗ r 2 ) × 128 × ( 3 ∗ 3 ) × 1 × 1 (3*3+f+3*r^2)\times 128\times (3*3) \times 1\times 1 (3∗3+f+3∗r2)×128×(3∗3)×1×1 Convolution of ; Next 5 Layer convolutions are all 128 × 128 × ( 3 ∗ 3 ) × 1 × 1 128\times 128\times (3*3)\times 1\times1 128×128×(3∗3)×1×1; The third convolution is 128 × ( 3 ∗ r 2 + f ) × ( 3 × 3 ) × 1 × 1 128\times (3*r^2+f)\times (3\times 3)\times 1\times 1 128×(3∗r2+f)×(3×3)×1×1, Then from the channel into 2 part , One of them passes through Relu Output R b × 128 × 64 × 64 \mathbb{R}^{b\times 128\times 64\times 64} Rb×128×64×64—— h t h_t ht, Another piece and x t ∗ x_t^* xt∗ Add to do Residual connection Output R b × ( 3 ∗ r 2 ) × 64 × 64 \mathbb{R}^{b\times (3*r^2)\times 64\times 64} Rb×(3∗r2)×64×64, among x t ∗ x^*_t xt∗ yes x t x_t xt Copy r 2 r^2 r2 Times the result —— R b × ( 3 ∗ r 2 ) × 64 × 64 \mathbb{R}^{b\times (3*r^2)\times 64\times 64} Rb×(3∗r2)×64×64:

- Shuffle-up Equivalent to PixelShuffle The process of ; and Shuffle-down Is and Shuffle-up The opposite process , Be similar to Understanding DCN-Alignment in VSR The process of deformable convolution expression with unity in .Feedback Used at shuffle-down To downsampling ,shuffle-up For x t → y t x_t\to y_t xt→yt Upper sampling part of .
Note:
- The residual connection allows the network to learn the residual part directly , So as to make the training more stable ; In addition, direct x t x_t xt The information is added to you CNN Loss of information .
- RLSP Each time only the score is exceeded 1 frame .
3.1 Shuffling
Shuffling It mainly includes shuffle-up To sample and shuffle-down To downsampling .
Shuffle-up The principle is ESPCN Of Subpixel convolution layer , It does not change the pixels , Instead, all pixels on the channel copy And combine to produce :
Shuffle-up \colorbox{springgreen}{Shuffle-up} Shuffle-up
t L R ∈ R H × W × Z → × r t H R ∈ R r H × r W × Z / r 2 . (1) t^{LR} \in \mathbb{R}^{H\times W\times Z} \;\;\;\mathop{\rightarrow}\limits^{\times r}\;\;\; t^{HR} \in \mathbb{R}^{rH\times rW \times Z/r^2}.\tag{1} tLR∈RH×W×Z→×rtHR∈RrH×rW×Z/r2.(1) Source code :
def shuffle_up(x, factor):
# format: (B, C, H, W)
b, c, h, w = x.shape
assert c % factor**2 == 0, "C must be a multiple of " + str(factor**2) + "!"
n = x.reshape(b, factor, factor, int(c/(factor**2)), h, w)
n = n.permute(0, 3, 4, 1, 5, 2)
n = n.reshape(b, int(c/(factor**2)), factor*h, factor*w)
return n
Shuffle-down \colorbox{orange}{Shuffle-down} Shuffle-down
t H R ∈ R H × W × Z → × r t L R ∈ R H / r × W / r × r 2 Z . (2) t^{HR} \in \mathbb{R}^{H\times W\times Z} \;\;\;\mathop{\rightarrow}\limits^{\times r}\;\;\; t^{LR}\in \mathbb{R}^{H/r \times W/r \times r^2 Z}.\tag{2} tHR∈RH×W×Z→×rtLR∈RH/r×W/r×r2Z.(2) Source code :
def shuffle_down(x, factor):
# format: (B, C, H, W)
b, c, h, w = x.shape
assert h % factor == 0 and w % factor == 0, "H and W must be a multiple of " + str(factor) + "!"
n = x.reshape(b, c, int(h/factor), factor, int(w/factor), factor)
n = n.permute(0, 3, 5, 1, 2, 4)
n = n.reshape(b, c*factor**2, int(h/factor), int(w/factor))
return n
3.2 Residual Learning
Is in the Cell Lieutenant general x t ∗ x_t^* xt∗ and CNN Combined with the output of , Make the network learn the residual part ; In addition to alleviating the gradient vanishing problem, the residual connection can increase a certain degree of stability , Let the learning range of residuals be narrowed so as to reduce the variance ; In addition, due to CNN Will inevitably attenuate the input information , Therefore, adding the input directly also helps to save the original input information .
3.3 Feedback
Feedback Will be y t − 1 y_{t-1} yt−1 Conduct shuffle-down Earth process , Because adjacent frames are highly correlated , Therefore, the fusion of this part of information is also helpful to the current frame x t x_t xt Superscription of .
3.4 Hidden State

and RNN equally , Hidden state h t − 1 h_{t-1} ht−1 Remember the characteristic information of the past , It combines with the current frame information to use the past feature information to help the current frame's super segmentation process . stay RLSP in , The author used 7 Layer by layer to learn hidden-state, The final output format is : R b × f × 64 × 64 , f = 128 \mathbb{R}^{b\times f\times 64\times 64},f=128 Rb×f×64×64,f=128.
3.5 Loss
RLSP The loss function is MSE:
L = 1 k ∣ ∣ y ∗ − y ∣ ∣ 2 2 . (3) \mathcal{L} = \frac{1}{k}||y^* - y||^2_2.\tag{3} L=k1∣∣y∗−y∣∣22.(3)
4. Experimental Setup
When I reproduce, the relevant experimental configuration is as follows :
params = {
"lr": 10 ** -4,
"bs": 2,
"crop size h": 64,
"crop size w": 64,
"sequence length": 5,
"validation sequence length": 20,
"number of workers": 8,
"layers": 7,
"kernel size": 3,
"filters": 128,
"state dimension": 128,
"factor": 4,
"save interval": 50000,
"validation interval": 1000,
"dataset root": "./dataset/",
"device": torch.device("cuda" if torch.cuda.is_available() else "cpu"),
}
Because the data set is not clearly written in the source code , And there is a problem with the content read from the data set in the source code , So I did 2 There are two changes :
Use REDS Data sets , The location of the dataset is as follows :

Use PIL.Image.open() To read the picture .
5. Results and Discussion
5.1 Ablation
In addition to residual connections ,RLSP Also used. 3 It's about tips:
- Adding adjacent frames.
- Feedback.
- Hidden-state.
In order to study the above 3 A point is right RLSP Influence ,ablation The experimental results are as follows :
The first one is that all frames are processed independently ; The second is to add adjacent frames ; The third is to add feedback—— y t − 1 y_{t-1} yt−1; The fourth item is to add feedback And hidden state h t − 1 h_{t-1} ht−1. The results are as follows :
- Above 3 Points for RLSP The improvement of expressiveness is helpful , But it also increases the amount of calculation in turn .
5.2 Temporal Consistency
A little
5.3 Information Flow over Time
A little
5.4 Initialization
A little
5.5 Accuracy and Runtimes
- The experiment is in Vid4 Verify on , Tested on vid4 On the whole sequence of .
- The average of the final statistics PSNR Is the average of vid4 in 4 Results of video sequences ; The statistical runtime Is the time required for each frame reconstruction (ms).
- The final goal of recovery is 2K Video sequence of .
The experimental results are as follows :
The results are as follows :
- RLSP-7-128 The processing time of each frame is 38ms, so 1s Can handle 25 frame , This illustrates the RLSP-7-128 Reached 25fps The real-time requirements of .
- RLSP-7-128 At the beginning PSNR Lower because it is a one-way propagation model , You can only use information from the past , This means that there is less information available at the beginning , There is more information available in the later stages , So like Figure 8 Shown , Naturally, the first few frames will have their own PSNR The lower , It will rise later —— That is, the unfairness of information utilization , You can solve this problem by adding backward branches .
- By increasing the cell To increase the number of filters in RLSP Expressive force .
The visualization results are as follows :
6. Conclusion
- In this paper, we propose an unaligned VSR Model ——RLSP. It will be VSR Modeling as Seq2Seq problem , To build RNN Structure to achieve video super division .
- RLSP use ①Shuffling;②Residual-Learning;③Feedback;④Hidden-state, common 4 individual tips To achieve PSNR The promotion of .
- RLSP The greatest characteristic is that the higher PSNR To increase the speed , Its 7-128 The model can just meet the real-time requirements ; By increasing the Cell The nonlinearity of ( Lift depth or width ) To enhance RLSP Expressive force .
边栏推荐
- Viewing technological changes through Huawei Corps (IV): interactive media (Music)
- C语言dll动态链接库
- 深入理解图注意力机制
- 《MATLAB 神经网络43个案例分析》:第27章 LVQ神经网络的预测——人脸朝向识别
- Chapter V operation bit and bit string
- PTA l3-032 questions about depth first search and reverse order pair should not be difficult (30 points)
- 众安保险联合阿里健康、慧医天下 探索互联网慢病管理新模式
- Two understandings of Bayes formula
- Kotlin常用函数 let,with,apply,also,run
- 【ORACLE】Oracle里有“time”数据类型吗?--关于对Oracle数据类型的一点研究
猜你喜欢
随机推荐
[dataset] |bigdetection
National administrative division
Seventy years of neural network: review and Prospect
PTA l3-032 questions about depth first search and reverse order pair should not be difficult (30 points)
AS 3744.1标准中提及ISO8191测试,两者测试一样吗?
一招教你通过焱融 SaaS 数据服务平台+ELK 让日志帮你做决策
C语言dll动态链接库
Sorting out Android kotlin generic knowledge points
火山引擎+焱融 YRCloudFile,驱动数据存储新增长
LeetCode_ String_ Simple_ 387. first unique character in string
Why is rediscluster designed with 16384 slots?
Lua导出为外部链接库并使用
3de 3D model View ne voit pas comment ajuster
众安保险联合阿里健康、慧医天下 探索互联网慢病管理新模式
How to perform en45545 fire test for battery shell
Performance test ---locust's on_ Start and on_ Stop method
Algorithm -- maximum number after parity exchange (kotlin)
The source code of the online live broadcast system enables you to request the list interface and touch the bottom page to load
窗帘做EN 1101易燃性测试过程是怎么样的?
Nacos registry ----- built and used from 0








