当前位置：网站首页>Thesis reading (59):keyword based diverse image retrieval with variable multiple instance graph

Thesis reading (59):keyword based diverse image retrieval with variable multiple instance graph

2022-06-28 11:01:00 【Inge】

List of articles

1 summary
2 frame
- 2.1 Semantic feature projection
- 2.2 Cross model diversity generator

1 summary

1.1 subject

2022TNNLS： Keyword based diversified image retrieval of variational multi example graph (Keyword-based diverse image retrieval with variational multiple instance graph)

1.2 background

Cross modal Image Retrieval Has recently attracted extensive research attention . In the real world , Keyword based queries issued by users are usually very short , And has a wide range of semantics . therefore , In this user oriented service , Semantic diversity is as important as retrieval accuracy , To improve the user experience . However , Most cross modal image retrieval methods based on single point query embedding have low semantic diversity , However, due to the lack of cross modal understanding, the accuracy of diversified retrieval methods is low .

1.3 Strategy

An end-to-end Variational multiexample graph (Variational multiple instance graph, VMIG)：
1） Learn a continuous semantic space To capture different query semantics ;
2） The retrieval task is formulated as a multi example learning problem , Connecting different features across modes .
In particular , Use query guided Variational self encoder (Variational autoencoder, VAE) To model continuous semantic space , Instead of learning single point embedding . then , By means of Sampling in continuous semantic space And applications Long attention Obtain multiple instances of images and queries respectively . thereafter , Build instance diagram To remove noisy instances and align cross modal semantics . Last , Heterogeneous patterns are fused robustly under multiple losses .

1.4 Bib

@article{
    Zeng:2022:110,
author		=	{
    Zeng, Yawen and Wang, Yiru and Liao, Dongliang and Li, Gongfu and Huang, Weijie and Xu, Jin and Cao, Da and Man, Hong},
title		=	{
    Keyword-based diverse image retrieval with variational multiple instance graph},
journal		=	{
    {
    IEEE} Transactions on Neural Networks and Learning Systems},
pages		=	{
    1--10},
year		=	{
    2022},
doi			=	{
    10.1109/TNNLS.2022.3168431},
url			=	{
    https://ieeexplore.ieee.org/abstract/document/9764824}
}

2 frame

chart 2 It shows VMIG The overall framework of , It consists of three parts ：
1） Semantic feature projection ： Extract the features of image and query , And project them into their respective semantic spaces ;
2） Cross model diversity generator ; Learn the one to many semantic distribution to generate multiple instances , And build a multi example diagram of cross model . Multiple instances of images and queries are query oriented VAE And long attention gain , The cross model multi example graph is used to explore the semantic relevance within the schema and cross schema alignment ;
3） Semantic space constraints ： Multiple losses are used to constrain the cross modal semantic space .

chart 1： Keyword based diversified image retrieval VMIG

2.1 Semantic feature projection

Make $v$ and $t$ Represent images and keyword based queries respectively . Given a $t$ , Our goal is Ensure relevance and diversity to retrieve appropriate images . In order to learn better characteristics , use first ResNet Extraction of image features $\mathbf{f}_v$ , And the use of Doc2Vec Get query characteristics $\mathbf{f}_t$ . These features are then separated Projection To the semantic space ：
$\tag{1} \left\{ \begin{array}{l} \tilde{\mathbf{f}}_v&=&o_v(\mathbf{f}_v)\\ \tilde{\mathbf{f}}_t&=&o_t(\mathbf{f}_t) \end{array} \right.$ among $o_v$ and $o_t$ It is approximated by a fully connected network Projection function .