当前位置:网站首页>Acl2022 | MVR: multi view document representation for open domain retrieval

Acl2022 | MVR: multi view document representation for open domain retrieval

2022-06-23 21:53:00 Zhiyuan community

Today I bring you an article ACL2022 The paper MVR,「 Multi view document representation for open domain retrieval 」, It mainly solves the problem of semantic mismatch between the same document vector and multiple documents . adopt 「 Insert multiple special Token」 Realize the construction of multi view document vector representation , And in order to prevent the convergence of vectors between multiple perspectives , Introduced 「 annealing temperature 」 Overall situation - Partial loss , Full title of the paper 《Multi-View Document Representation Learning for Open-Domain Dense Retrieval》.

paper Address :https://aclanthology.org/2022.acl-long.414.pdf

This paper is the same as that shared two days ago DCSR- Sentence aware contrastive learning for open domain paragraph retrieval A piece of writing has the same meaning , Both of them do not introduce extra computation into search sorting , By inserting special Token Construct multi semantic vector representation of long documents , Make the same document similar to the vector representation of many different problems . And the current retrieval recall models have some defects :

  • Cross-encoder Class model (BERT) Because of the amount of calculation , Cannot be used during recall ;
  • Bi-encoder Class model (DPR) The multi topic elements in long documents cannot be well represented ;
  • Late Interaction Class model (ColBERT) Due to the use sum operation , Cannot be used directly ANN Sort ;
  • Attention-based Aggregator Class model (PolyEncoder) Additional operations are added and cannot be used directly ANN Sort .

Model

Usually, vector representation , Use special characters [CLS] The corresponding vector representation is the vector representation of the text . In order to obtain more fine-grained semantic information in the document ,MVR Introduce multiple special characters [VIE] replace [CLS].

  • For documents , Insert multiple characters before text [], In order to prevent interference with the location information of the original text , We will [] All location information of is set to 0, Document statement location information from 1 Start .
  • For the problem , Because the questions are short and usually mean the same thing , So only one special character is used [VIE].

The model uses double encoders as the backbone , Code the questions and documents separately , as follows :

among , Indicates a link character ,[VIE] and [SEP] by BERT Special characters of the model , And are the problem encoder and the document encoder respectively .

As shown in the figure above , First, calculate the dot product of the problem vector and the document vector of each perspective , Get a score for each perspective , And then through max-pooler operation , Get the score of the problem vector and document vector with the largest score in the perspective , as follows :

In order to prevent the convergence of vectors between multiple views , A new method with annealing temperature is introduced Global-Local Loss, Including global contrast loss and local uniform loss , as follows :

among , The global comparison loss is the traditional comparison loss function , Given a problem 、 One positive example document and multiple negative example documents , The loss function is as follows :

In order to improve the uniformity of multi view vector , Local uniformity loss is proposed , Force the selected query vector to be closer to the view vector , Principle other its view vector , as follows :

In order to further distinguish the differences between different view vectors , The annealing temperature is adopted , Gradually adjust the angle of view of different vectors softmax Distribution , as follows :

among , To control the hyperparameter of annealing speed , Number of training rounds for the model , Every training round , Temperature update once . Be careful : In global contrast loss and local uniform loss , The annealing temperature is used .

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231950170385.html