当前位置:网站首页>Relationship extraction -- casrel
Relationship extraction -- casrel
2022-06-26 08:27:00 【xuanningmeng】
Relationship extraction –CASREL
Relation extraction is a basic task in naturallanguageprocessing . Relation extraction usually uses triples (subject, relation, object) Express . There are two ways to solve relationship extraction :
(1) Two entities are known subject and object, Use the classification model to get the relationship between entities
(2) Extract entities , Predict possible relationships between entities . If you first extract entities and then predict relationships , This is called pipline The formula extraction ; If we extract entities and relationships between entities at the same time , This method is called joint extraction .
The data set of relation extraction is complex , Entities and relationships in the dataset overlap , An ideal data set for relational extraction subject and object Corresponding to a relationship , Real data sets subject and object Corresponding to a variety of relationships , Entities overlap . Here's the picture :
among EPO Indicates that the entity is repeated ,SPO Represents a single entity repetition .
When relational triples (subject, relation, object) When overlapping , Relational classification models are difficult to handle overlapping data . If there are not enough training examples , It is difficult for classifiers to tell which relationship entities participate in , Extracted triples are usually incomplete and inaccurate . However CASREL The model can effectively deal with overlapping relational triples .
CASREL Model
novel cascade binary tagging framework(CASREL) The model is A Novel Cascade Binary Tagging Framework for Relational Triple Extraction Proposed ,CASREL The model has been refreshed SOTA Result .CASREL The model is divided into two steps :
(1) Through pre training BERT The model gets all the possible subject
(2) For each subject, We apply a relationship specific marker to identify all possible relationships and corresponding object.
The purpose of extracting relational triples is to identify all possible (subject,relation,object), Some of these relationships may be the same as sharing subject or object Entity . therefore CASREL The objective function of is expressed as :
CASREL The model structure is shown below 
Subject Tagger
Subject Tagger The model in is decoded directly N layer BERT The encoding vector generated by the encoder hN To identify all possible in the input sentence subject, In fact, two identical two classifiers are used (0/1) To mark subject Start and end of , The formula is as follows :
In a given sentence subject The maximum likelihood function of is :
Relation-specific Object Taggers
Relation-specific object taggers Considering subject Characteristics of , Instead of directly decoding pre training bert Model HN,relation-specific object taggers The formula is as follows :
Relation-specific object taggers The maximum likelihood function of is 
Chinese experimental results
On Chinese data and English data sets tokenizer The treatment is basically the same , Every Chinese character is followed by [unused1], use chinese_L-12_H-768_A-12 Pre training model . The format of Chinese dataset processing is as follows :
{
"text": " How to play your part well , Please read 《 Self-cultivation of actors 》《 The king of comedy 》 The unique secret collection of Stephen Chow rising from poverty ",
"triple_list": [
[
" The king of comedy ",
" starring ",
" Stephen Chow "
]
]
}
The model parameters are as follows :
max_length=128, batch_size=16, lr=1e-5, epoch=16
The evaluation results of the model are as follows :
f1: 0.7827, precision: 0.7736, recall: 0.7921, best f1: 0.7944
The prediction results of the model are as follows :
{
"text": "《 The magic show of love 》 It's a song sung by Anxia , Written by wuyiwei ,MartinHansen/StefanDouglasHayOsson Composing music , Included on album 《 Single best 》 in ",
"triple_list_gold": [
{
"subject": " The magic show of love ",
"relation": " The album ",
"object": " Single best "
},
{
"subject": " The magic show of love ",
"relation": " singer ",
"object": " Anxia "
}
],
"triple_list_pred": [
{
"subject": " The magic show of love ",
"relation": " The album ",
"object": " Single best "
},
{
"subject": " The magic show of love ",
"relation": " singer ",
"object": " Anxia "
},
{
"subject": " The magic show of love ",
"relation": " Lyrics ",
"object": " Wuyiwei "
}
],
"new": [
{
"subject": " The magic show of love ",
"relation": " Lyrics ",
"object": " Wuyiwei "
}
]
}
If there is an error , Welcome to correct .
边栏推荐
- XXL job configuration alarm email notification
- Can the encrypted JS code and variable name be cracked and restored?
- Fabrication of modulation and demodulation circuit
- Relevant knowledge of DRF
- opencv学习笔记二
- What is Qi certification Qi certification process
- static const与static constexpr的类内数据成员初始化
- Recyclerview item gets the current position according to the X and Y coordinates
- Database learning notes II
- Database learning notes I
猜你喜欢

FFmpeg音视频播放器实现

StarWar armor combined with scanning target location

Go语言浅拷贝与深拷贝

Oracle 19C download installation steps
GHUnit: Unit Testing Objective-C for the iPhone

2020-10-20

Quickly upload data sets and other files to Google colab ------ solve the problem of slow uploading colab files

Database learning notes II

What is Qi certification Qi certification process

Vs2019-mfc setting edit control and static text font size
随机推荐
Assembly led on
static const与static constexpr的类内数据成员初始化
(3) Dynamic digital tube
2020-10-29
Oracle database self study notes
Idea update
Late 2021 plan
GHUnit: Unit Testing Objective-C for the iPhone
[postgraduate entrance examination planning group] conversion between signed and unsigned numbers
Color code
Swift code implements method calls
ZLMediaKit推流拉流测试
MySQL practice: 1 Common database commands
Oracle 19C download installation steps
Win10 mysql-8.0.23-winx64 solution for forgetting MySQL password (detailed steps)
opencv学习笔记二
2020-10-20
Quickly upload data sets and other files to Google colab ------ solve the problem of slow uploading colab files
Handwritten instanceof underlying principle
MFC writes a suggested text editor