当前位置:网站首页>Target segmentation for 10000 frames of video, less than 1.4GB of video memory, open source code | ECCV 2022
Target segmentation for 10000 frames of video, less than 1.4GB of video memory, open source code | ECCV 2022
2022-07-25 01:11:00 【QbitAl】
bright and quick From the Aofei temple
qubits | official account QbitAI
Why , How good Fujiwara Qianhua , All of a sudden “ High temperature red version ”?

This big purple hand , Is mieba alive ??

If you think the above effects are just coloring the object later , That was really AI Cheated .
These strange colors , In fact, it is the representation of video object segmentation .
but u1s1, This effect is really indistinguishable for a time .
Whether it's cute girl's flying hair :

Or a towel that changes its shape 、 Objects block back and forth :

AI The segmentation of the target can be called perfect , It seems that the color “ weld ” Go up .
It's not just high-precision segmentation of targets , This method can also handle more than 10000 frame In the video .
And the segmentation effect is always maintained at the same level , The second half of the video is still silky and fine .

What's more surprising is , This method is right GPU Not very demanding .
The researchers said that during the experiment , This method consumes GPU Memory never exceeds 1.4GB.
Need to know , Current similar methods based on attention mechanism , You can't even process more than... On ordinary consumer graphics cards 1 Minute video .
This is the University of Illinois Urbana - A long video target segmentation method recently proposed by scholars at the University of champagne XMem.
At present has been ECCV 2022 receive , The code is open source, too .
Such a silky effect , still Reddit Attract many netizens to watch , Heat up to 800+.

Netizens are joking :
Why paint your hands purple ?
Who knows if mieba has a hobby in computer vision ?

Imitate human memory
At present, there are many video object segmentation methods , But they are either slow to process , Or yes GPU Demand is high , Or the accuracy is not high enough .
And the method proposed in this paper , It can be said that the above three aspects are taken into account .
It can not only quickly segment long videos , The number of frames can reach 20FPS, At the same time, in ordinary GPU I can finish it .
What's special about it is , It is inspired by human memory patterns .
1968 year , Psychologists Atkinson and schifflin proposed Multiple storage model (Atkinson-Shiffrin memory model).
The model says , Human memory can be divided into 3 Patterns : Instantaneous memory 、 Short term memory and long term memory .
Refer to the above mode , Researchers AI The framework is also divided into 3 Memory mode . Namely :
Instant memory updated in time
High resolution working memory
Dense long-term memory .

among , The transient memory is updated every frame , To record the image information in the picture .
Working memory collects picture information from transient memory , The update frequency is every r Frame once .
When the working memory is saturated , It will be compressed and transferred to long-term memory .
When the long-term memory is saturated , Will forget outdated features over time ; Generally speaking, this will be saturated after processing thousands of frames .
thus ,GPU Memory will not be insufficient due to the passage of time .
Usually , Segmentation of the video target will give the image of the first frame and the target object mask , Then the model will track the relevant targets , Generate corresponding masks for subsequent frames .
The specific term ,XMem The process of processing a single frame is as follows :

Whole AI Frame by 3 An end-to-end convolution network .
One Query encoder (Query encoder) Used to track, extract and query specific image features .
One decoder (Decoder) Responsible for obtaining the output of the memory reading step , To generate an object mask .
One Value encoder (Value encoder) You can combine the image with the mask of the target , So as to extract new memory characteristic values .
The characteristic value extracted by the final value encoder will be added to the working memory .
From the experimental results , This method is applied to short video and long video , It's all done SOTA.

When processing long videos , As the number of frames increases ,XMem The performance of has not decreased .

Research team
One of the authors is Chinese Ho Kei (Rex) Cheng.

He graduated from Hong Kong University of science and Technology , At the University of Illinois, Urbana - A doctoral degree at the University of champagne .
The research direction is computer vision .
Many of his papers have been CVPR、NeurIPS、ECCV Wait for the top to receive .
Another author is Alexander G. Schwing.

He is now at the University of Illinois, Urbana - Assistant professor at the University of champagne , He graduated from the Federal Institute of technology in Zurich .
His research interests are machine learning and computer vision .
Address of thesis :
https://arxiv.org/abs/2207.07115
GitHub:
https://github.com/hkchengrex/XMem
边栏推荐
- Basic functions of tea
- Ad active directory and domain network
- Codeworks round 649 (Div. 2) ABC problem solution
- Pursue and kill "wallet Assassin" all over the network
- ROS manipulator movelt learning notes 3 | kinect360 camera (V1) related configuration
- Prosci anti-CD22 antibody epratuzum28 flow cytometry display
- How to use measurement data to drive the improvement of code review
- Redis pipeline technology / partition
- Latest information of 2022 cloud computing skills competition
- Introduction to thread pool
猜你喜欢
![[25. Hash table]](/img/c4/1500d070d44d3bd84eb141ed38013d.png)
[25. Hash table]

If in ython__ name__ == ‘__ main__‘: Function and principle of

How to implement the server anti blackmail virus system is a problem we have to consider

Worthington carboxyl transfer carbonic anhydrase application and literature reference

Join MotoGP Monster Energy British Grand Prix!
![Detailed explanation of zero length array in C language (1) [information at the end of the article]](/img/89/1f01e24ce52b2d459f26397cd8527f.png)
Detailed explanation of zero length array in C language (1) [information at the end of the article]

C # "learning code snippet" - recursively obtain all files under the folder

How to implement a state machine?

A string "0" was actually the culprit of the collapse of station b

Pads copper laying
随机推荐
Worthington carboxyl transfer carbonic anhydrase application and literature reference
The IPO of Tuba rabbit was terminated: the annual profit fell by 33%, and Jingwei Sequoia was the shareholder
Brush questions of binary tree (5)
7.18 - daily question - 408
[icore4 dual core core _arm] routine 22: LwIP_ UDP experiment Ethernet data transmission
ES6 modularization
Which automation tools can double the operation efficiency of e-commerce?
MySQL Basics (concepts, common instructions)
Divide 300000 bonus! Deeperec CTR model performance optimization Tianchi challenge is coming
Unity3d calls between different script functions or parameters
Yolov7:oserror: [winerror 1455] the page file is too small to complete the final solution of the operation
Chip sold at sand price: Lei Jun's dream was "ruined" by this company
Vegetable greenhouses turned into smart factories! Baidu AI Cloud helps Shouguang, Shandong build a new benchmark for smart agriculture
Specificity and five applications of Worthington alcohol dehydrogenase
Latex notes
#648 (Div. 2)(A. Matrix Game、B. Trouble Sort、C. Rotation Matching)
Several states of the process
Luo min cannot become Dong Yuhui
Codeworks round 651 (Div. 2) ABCD solution
record