当前位置:网站首页>NAACL2022:(代码实践)好的视觉引导促进更好的特征提取,多模态命名实体识别(附源代码下载)...
NAACL2022:(代码实践)好的视觉引导促进更好的特征提取,多模态命名实体识别(附源代码下载)...
2022-06-26 13:42:00 【计算机视觉研究院】
关注并星标
从此不迷路
计算机视觉研究院
公众号ID|ComputerVisionGzq
学习群|扫码在主页获取加入方式
论文地址:https://arxiv.org/pdf/2205.03521.pdf
代码地址: https://github.com/zjunlp/HVPNeT
计算机视觉研究院专栏
作者:Edison_G
多模态命名实体识别和关系提取(MNER 和 MRE)是信息提取中的一个基础和关键分支。
1
概括
多模态命名实体识别和关系提取(MNER和MRE)是信息提取中的一个基础和关键分支。然而,当文本中包含不相关的对象图像时,现有的MNER和MRE方法通常会受到错误敏感性的影响。
为了解决这些问题,有研究者提出了一种新颖的分层视觉前缀融合网络(HVPNeT),用于视觉增强实体和关系提取,旨在实现更有效和更强大的性能。
具体来说,将视觉表示视为可插入的视觉前缀,以指导错误不敏感预测决策的文本表示。进一步提出了一种动态门控聚合策略,以实现分层多尺度视觉特征作为融合的视觉前缀。在三个基准数据集上进行的大量实验证明了新方法的有效性,并实现了最先进的性能。
2
新框架
Collection of Pyramidal Visual Feature
一方面,与句子关联的图像维护了与句子中的实体相关的多个视觉对象,进一步提供了更多的语义知识来辅助信息提取。另一方面,全局图像特征可能表达抽象概念,起到弱学习信号的作用。因此,为多模态实体和关系提取收集了多个视觉线索,其中包括以区域图像为重要信息,以全局图像为补充。
Dynamic Gated Aggregation
尽管不同大小的对象可以在相应的尺度上具有适当的特征表示,但决定视觉骨干中的哪个块为Transformer中的每一层分配视觉前缀并非易事。为了应对这一挑战,研究者建议构建密集连接的路由空间,其中分层多尺度视觉特征与每个变压器层连接。
Dynamic Gate Module
通过动态门模块进行例行处理,可以将其视为路径决策的过程。动态门的动机是预测一个归一化向量,它表示执行每个块的视觉特征的程度。
Aggregated Hierarchical Feature
基于上述动态门g(l),可以推导出最终聚合的层次视觉特征Vgated,以匹配Transformer中的第l层:
Visual Prefix-guided Fusion
将分层多尺度图像特征作为视觉前缀,并在BERT的每个自注意力层将视觉前缀序列添加到文本序列中。
将分层多尺度视觉特征作为每个融合层的视觉前缀,并依次进行多模态注意力以更新所有文本状态。通过这种方式,最终的文本状态同时对上下文和跨模态语义信息进行编码。这有利于降低不相关对象元素的错误敏感性。
3
实验
4
代码实践
To run the codes, you need to install the requirements:
pip install -r requirements.txt
Data Collection:
The datasets that we used in our experiments are as follows:
Twitter2015 & Twitter2017
The text data follows the conll format. You can download the Twitter2015 data via this link and download the Twitter2017 data via this link. Please place them in
data/NER_data
.You can also put them anywhere and modify the path configuration in
run.py
MNER
The MRE dataset comes from MEGA and you can download the MRE dataset with detected visual objects using folloing
command:
cd data
wget 120.27.214.45/Data/re/multimodal/data.tar.gz
tar -xzvf data.tar.gz
mv data RE_data
Data Preprocess:
HMNeT
|-- data # conll2003, mit-movie, mit-restaurant and atis
| |-- NER_data
| | |-- twitter2015 # text data
| | | |-- train.txt
| | | |-- valid.txt
| | | |-- test.txt
| | | |-- twitter2015_train_dict.pth # {full-image-[object-image]}
| | | |-- ...
| | |-- twitter2015_images # full image data
| | |-- twitter2015_aux_images # object image data
| | |-- twitter2017
| | |-- twitter2017_images
| |-- RE_data
| | |-- ...
|-- models # models
| |-- bert_model.py
| |-- modeling_bert.py
|-- modules
| |-- metrics.py # metric
| |-- train.py # trainer
|-- processor
| |-- dataset.py # processor, dataset
|-- logs # code logs
|-- run.py # main
|-- run_ner_task.sh
|-- run_re_task.sh
Train:
NER Task
The data path and GPU related configuration are in the run.py. To train ner model, run this script.
bash run_twitter15.sh
bash run_twitter17.sh
checkpoints can be download via Twitter15_ckpt, Twitter17_ckpt.
RE Task
To train re model, run this script.
bash run_re_task.sh
checkpoints can be download via re_ckpt
Test:
NER Task
To test ner model, you can download the model chekpoints we provide via Twitter15_ckpt, Twitter17_ckpt or use your own tained model and set load_path to the model path, then run following script:
python -u run.py \
--dataset_name="twitter15/twitter17" \
--bert_name="bert-base-uncased" \
--seed=1234 \
--only_test \
--max_seq=80 \
--use_prompt \
--prompt_len=4 \
--sample_ratio=1.0 \
--load_path='your_ner_ckpt_path'
RE Task
To test re model, you can download the model chekpoints we provide via re_ckpt or use your own tained model and set load_path to the model path, then run following script:
python -u run.py \
--dataset_name="MRE" \
--bert_name="bert-base-uncased" \
--seed=1234 \
--only_test \
--max_seq=80 \
--use_prompt \
--prompt_len=4 \
--sample_ratio=1.0 \
--load_path='your_re_ckpt_path'
THE END
转载请联系本公众号获得授权
计算机视觉研究院学习群等你加入!
计算机视觉研究院主要涉及深度学习领域,主要致力于人脸检测、人脸识别,多目标检测、目标跟踪、图像分割等研究方向。研究院接下来会不断分享最新的论文算法新框架,我们这次改革不同点就是,我们要着重”研究“。之后我们会针对相应领域分享实践过程,让大家真正体会摆脱理论的真实场景,培养爱动手编程爱动脑思考的习惯!
扫码关注
计算机视觉研究院
公众号ID|ComputerVisionGzq
学习群|扫码在主页获取加入方式
往期推荐
边栏推荐
- Question bank and answers of the latest Guizhou construction eight (Mechanics) simulated examination in 2022
- Luogu p4145 seven minutes of God created questions 2 / Huashen travels around the world
- Leaflet load day map
- Knowledge about adsorption
- New specification of risc-v chip architecture
- Sword finger offer 18.22.25.52 Double pointer (simple)
- C language | Consortium
- 9項規定6個嚴禁!教育部、應急管理部聯合印發《校外培訓機構消防安全管理九項規定》
- How to convert data in cell cell into data in matrix
- Common evaluation indexes of classification model -- confusion matrix and ROC curve
猜你喜欢
Common evaluation indexes of classification model -- confusion matrix and ROC curve
Matplotlib common operations
Jianzhi offer 43.47.46.48 dynamic planning (medium)
Combat readiness mathematical modeling 32 correlation analysis 2
Freefilesync folder comparison and synchronization software
Sword finger offer 09.30 Stack
常用控件及自定义控件
'coach, I want to play basketball!'—— AI Learning Series booklet for system students
Sword finger offer 10 Ⅰ 10Ⅱ. 63 dynamic planning (simple)
FreeFileSync 文件夹比较与同步软件
随机推荐
Sword finger offer 09.30 Stack
Related knowledge of libsvm support vector machine
[wc2006] director of water management
Two dimensional DFS
9 regulations and 6 prohibitions! The Ministry of education and the emergency management department jointly issued the nine provisions on fire safety management of off campus training institutions
[scoi2016] lucky numbers
备战数学建模31-数据插值与曲线拟合3
Research on balloon problem
Common controls and custom controls
PostGIS create spatial database
transformers DataCollatorWithPadding类
vmware部分设置
Freefilesync folder comparison and synchronization software
Difference between classification and regression
Is it safe to open a securities account? Is there any danger
datasets Dataset类(2)
'coach, I want to play basketball!'—— AI Learning Series booklet for system students
New specification of risc-v chip architecture
C language | the difference between heap and stack
登录认证服务