当前位置:网站首页>(CVPR-2022)BiCnet
(CVPR-2022)BiCnet
2022-07-23 22:47:00 【Gu daochangsheng】
stay [30, 42] after , We decompose the video network into spatial clues and temporal relationships . Use efficient BiCnet Fully explore spatial clues , We built a Temporal Kernel Selection Blocks to jointly model short-term and long-term time relationships . Because the time relationship of different scales has different importance for different sequences ( Pictured 2 Shown ),TKS Combine multi-scale time relationships in a dynamic way , That is to assign different weights to different time scales according to the input sequence .

chart 2: The short-term and long-term temporal relationships have different importance for different sequences . (a) Partially occluded sequence . Long term time clues are needed to reduce occlusion . (b) Fast moving pedestrian sequence . Short term time cues are needed to simulate detailed movement patterns .
”
Special ,TKS With a series of continuous frame characteristics As input , among It's No Characteristic diagram of frame , And in Perform triple operations on , namely Partition、Select and Excite.
Partition operation . Due to the imperfect character detection algorithm , The adjacent frames of the video are not well aligned , This may cause time convolution in the video reID [9] The invalid . stay [34] after , We use partition strategy to alleviate the problem of spatial dislocation . say concretely , Given video feature map , We divide each frame into A spatial area , And average pool each divided area , Build regional video feature map .
Select operation . Pictured 4 Shown , Given , We carry out Parallel paths , among F (i) Yes. Kernel size 1D Time convolution [30]. In order to further improve efficiency , have The time convolution of the kernel is replaced with Kernel and expansion size Extended convolution of . The basic idea of the selection operation is to use the global information from all time paths to determine the weight assigned to each path . say concretely , We first fuse the outputs of all paths by summing the elements , Then perform global average pooling to obtain global characteristics :
among Represents global average pooling along time and space dimensions . Then embed according to the global Get the channel selection weight ,
among Is for Generate Transformation parameters of . Then the aggregation characteristic graph is obtained through the selection weights on various time cores ,
among Yes, it will Remodel as In order to Size compatible reshaping operation .

It's worth pointing out , Compared with using scale weights to provide rough fusion , We choose to use channel weights ( equation 7) To merge . This design results in finer grained fusion , Each characteristic channel can be adjusted . Besides , The weight is dynamically calculated according to the input video . This may have different dominant time scales for different sequences reID crucial .
Trigger operation . The excitation operation pairs Adjust to modulate the input characteristic diagram . The final feature map by :. here It is the nearest neighbor sampler , It's right Perform upsampling to match The spatial resolution of . TKS The block maintains the input size , Therefore, it can be inserted into BiCnet To extract effective spatio-temporal features .
边栏推荐
- Absl tutorial (4): strings Library
- The Minesweeper game
- 【Unity3D日常BUG】Unity3D解决“找不到类型或命名空间名称“XXX”(您是否缺少using指令或程序集引用?)”等问题
- Ways to improve the utilization of openeuler resources 01: Introduction
- unity visual studio2019升级到2022版本(扔掉盗版红渣)
- As a developer, you have to know the three performance testing tools JMeter, API and jmh user guide
- 海外资深玩家的投资建议(3) 2021-05-04
- [golang learning notes] simple use of flag package, command line parsing
- 关于电脑端同步到手机端数据
- [unity3d daily bug] unity3d solves "the type or namespace name" XXX "cannot be found (are you missing the using directive or assembly reference?)" Etc
猜你喜欢

Array - 11. Containers with the most water

Wangxuegang video coding -- mediacodec coding and decoding

海外资深玩家的投资建议(2) 2021-05-03

Build your own target detection environment, model configuration, data configuration mmdetection

Memory search - DP

Mqtt connection, subscription and publishing can be realized without mqtt C library

Internet协议栈 TCP/IP模型 物理层、链路层、网络层、传输层、应用层的作用

ospf终极实验——学会ospf世纪模板例题

Diabetes genetic risk testing challenge baseline

疯狂的牛市,下半场何去何从?2021-04-30
随机推荐
[unity3d daily bug] unity3d solves "the type or namespace name" XXX "cannot be found (are you missing the using directive or assembly reference?)" Etc
MySQL index transaction
Matlab小波工具箱导入信号出错(doesn‘t contain one dimensional Singal)
关于电脑端同步到手机端数据
MySQL的 DDL和DML和DQL的基本语法
Array - 977. Square of ordered array
详解NAT技术
Diabetes genetic risk testing challenge baseline
[problem handling] merge made by the 'ort' strategy
About: enable delivery optimization in enterprise LAN
ES6 use of arrow function
Microsoft SQL Server数据库语言及功能使用(十三)
糖尿病遗传风险检测挑战赛Baseline
Multithreading problem: why should we not use multithreading to read and write the same socket connection?
Microsoft SQL Server database language and function usage (XIII)
Use of [golang learning notes] package
The role of physical layer, link layer, network layer, transport layer and application layer of tcp/ip model of internet protocol stack
海外资深玩家的投资建议(3) 2021-05-04
视频号加强打击低俗内容:对违背公序良俗的内容必须赶尽杀绝
海外资深玩家的投资建议(2) 2021-05-03