当前位置:网站首页>论文阅读 (59):Keyword-Based Diverse Image Retrieval with Variational Multiple Instance Graph
论文阅读 (59):Keyword-Based Diverse Image Retrieval with Variational Multiple Instance Graph
2022-06-28 10:51:00 【因吉】
1 概述
1.1 题目
1.2 背景
跨模态图像检索的最近引起了广泛研究关注。在实际场景中,用户发出的基于关键字的查询通常很短,且具有广泛的语义。因此,在这种面向用户的服务中,语义多样性与检索准确性一样重要,从而提高用户体验。然而,大多数基于单点查询嵌入的跨模态图像检索方法语义多样性低,而多样化检索方法由于缺乏跨模态理解准确性低。
1.3 策略
提出了一种端到端的变分多示例图 (Variational multiple instance graph, VMIG):
1)学习一个连续的语义空间来捕获不同的查询语义;
2)将检索任务制定为一个多示例学习问题,以跨模态连接不同的特征。
具体地,使用查询引导的变分自编码器 (Variational autoencoder, VAE) 来对连续语义空间进行建模,而非学习单点嵌入。然后,通过在连续语义空间中采样和应用多头注意力分别获得图像和查询的多个实例。此后,构建实例图以去除噪声实例并对齐跨模态语义。最后,异构模式在多重损失下被稳健地融合。
1.4 Bib
@article{
Zeng:2022:110,
author = {
Zeng, Yawen and Wang, Yiru and Liao, Dongliang and Li, Gongfu and Huang, Weijie and Xu, Jin and Cao, Da and Man, Hong},
title = {
Keyword-based diverse image retrieval with variational multiple instance graph},
journal = {
{
IEEE} Transactions on Neural Networks and Learning Systems},
pages = {
1--10},
year = {
2022},
doi = {
10.1109/TNNLS.2022.3168431},
url = {
https://ieeexplore.ieee.org/abstract/document/9764824}
}
2 框架
图2展示了VMIG的总体框架,其包含三个部分:
1)语义特征投影:提取图像与查询的特征,并将其投影到各自的语义空间;
2)跨模特多样化生成器;学习一对多的语义分布以生成多个实例,并构建跨模特多示例图。图像与查询的多个实例分别通过查询导向的VAE以及多头注意力获得,而跨模型多示例图用于探索模式内语义相关性和跨模式对齐;
3)语义空间约束:多个损失用于约束跨模态语义空间。

2.1 语义特征投影
令 v v v和 t t t分别表示图像和基于关键词的查询。给定一个 t t t,我们的目标是保证相关性和多样性地检索到合适的图像。为了学习到更好的特征,首先使用ResNet提取图像特征 f v \mathbf{f}_v fv,以及使用Doc2Vec获取查询特征 f t \mathbf{f}_t ft。然后将这些特征分别投影到语义空间:
{ f ~ v = o v ( f v ) f ~ t = o t ( f t ) (1) \tag{1} \left\{ \begin{array}{l} \tilde{\mathbf{f}}_v&=&o_v(\mathbf{f}_v)\\ \tilde{\mathbf{f}}_t&=&o_t(\mathbf{f}_t) \end{array} \right. { f~vf~t==ov(fv)ot(ft)(1)其中 o v o_v ov和 o t o_t ot是通过全连接网络近似的投影函数。
2.2 跨模特多样化生成器
边栏推荐
- An idea plug-in that automatically generates unit tests, which improves the development efficiency by more than 70%!
- AGCO AI frontier promotion (6.28)
- Interface automation framework scaffold - use reflection mechanism to realize the unified initiator of the interface
- JS基础6
- flink1.15,支持mysql视图吗?我这边在table-name处配置视图名保存,找不到表。想
- 移动命令
- 【功能建议】多个工作空间启动时选择某个空间
- MySQL (I)
- Fabric. How to use js brush?
- AQS understanding
猜你喜欢

DlhSoft Kanban Library for WPF

Realization of a springboard machine

Mysql通用二进制安装方式

一种跳板机的实现思路

港伦敦金行情走势图所隐藏的信息

爱可可AI前沿推介(6.28)

Day 6 script and animation system

无线通信模块定点传输-点对多点的具体传输应用

Starting from full power to accelerate brand renewal, Chang'an electric and electrification products sound the "assembly number"
![[unity][ecs] learning notes (I)](/img/eb/1f0ad817bbc441fd8c14d046b82dd0.png)
[unity][ecs] learning notes (I)
随机推荐
[unity][ecs] learning notes (I)
阿里三面:LEFT JOIN关联表中用ON还是WHERE跟条件有什么区别
[practice] appium settings app is not running after 5000ms
JSON模块、hashlib、base64
An idea plug-in that automatically generates unit tests, which improves the development efficiency by more than 70%!
June training (day 28) - Dynamic Planning
ICMP协议的作用,Ping of Death攻击的原理是什么?
2D code generator for openharmony application development
Must the MySQL table have a primary key for incremental snapshots?
Understand 12 convolution methods (including 1x1 convolution, transpose convolution and deep separable convolution)
Metersphere implements UI automation elements that are not clickable (partially occluded)
关于FTP的协议了解
Bytecode proof in appliedzkp zkevm (9)
Several methods of using ABAP to operate Excel
Yann LeCun新论文:构建自动智能体之路
Installing MySQL database (CentOS) in Linux source code
[leetcode daily question] [December 19, 2021] 997 Find the town judge
Information hidden in the trend chart of Hong Kong London gold market
Summary of characteristics of five wireless transmission protocols of Internet of things
Compression and decompression