当前位置：网站首页>Basic concepts and common methods of syntactic dependency analysis

Basic concepts and common methods of syntactic dependency analysis

2022-06-23 21:38:00 【Goose】

0. Background of parsing and dependency analysis

Syntax refers to the relationship between the various components of a sentence .

Syntactic analysis is divided into

Syntactic structure analysis （syntactic structure parsing）
Syntactic structure analysis is also called phrase structure analysis （phrase structure parsing）, Also called componential parsing （constituent syntactic parsing）, Used to get the syntactic structure of the whole sentence ;
Dependency analysis (dependency parsing)

Dependency analysis is used to obtain the dependencies between words .

Deep grammar parsing

Deep grammar parsing , That is, using deep grammar , For example, lexicalized tree adjacency grammar （Lexicalized Tree Adjoining Grammar, LTAG）、 Lexical Functional Grammar （Lexical Functional Grammar, LFG）、 Combinatorial category grammar （Combinatory Categorial Grammar, CCG） etc. , Deep syntactic and semantic analysis of sentences .

At present, syntactic analysis has shifted from syntactic structure analysis to dependency parsing .

Dependency grammar reveals the syntactic structure of a language unit by analyzing the dependencies between its components , It is claimed that the core verb in a sentence is the central component that dominates other components , But it itself is not dominated by any other component , All the dominating elements are subordinate to the dominator in some dependency .

stay 20 century 70 years ,Robinson Four axioms about dependency in dependency grammar ：

Only one component of a sentence is independent ;
Other ingredients are directly dependent on one ingredient ;
No one component can depend on two or more components ;
If A The ingredients are directly dependent on B composition , and C The element is located in A and B Between , that C Or directly depend on B, Or directly depend on A and B A certain component between ;

1. dependency analysis

Dependency is a binary asymmetric relationship between a headword and its subordinate , The head word of a sentence is usually a verb （Verb）, All other words either depend on the head word , Or you can associate it with a dependent path .

Some important concepts ：

Dependency syntax holds that “ Predicate ” The verb in is the center of a sentence , Other elements are directly or indirectly related to verbs .
In dependency syntax theory ,“ interdependent ” It refers to the relationship between domination and domination of words , This relationship is not reciprocal , This relationship has direction . To be exact , The dominant element is called the dominator （governor,regent,head）, The dominant element is called a subordinate （modifier,subordinate,dependency）.
The dependency grammar itself does not specify the classification of dependencies , But in order to enrich the syntactic information conveyed by dependency structures , in application , Generally, the edges of the dependency tree are marked with different marks .
There is a common basic assumption in dependency grammar ： Syntactic structure essentially contains the dependencies between words （ modification ） Relationship . A dependency connects two words , They are the core words （head） And dependent words （dependent）. Dependencies can be broken down into different types , It indicates the specific syntactic relationship between two words .

Dependency structures are labeled digraphs , The arrow points from the central word to the subordinate , say concretely , The arrow is from head Point to child, From the parse tree, we can see , Every Token only one Head.

As shown in the figure above , Compared with component parsing , Dependency parsing can more directly analyze the subject, predicate and other components of a sentence . Another point , The result of dependency parsing , The relationship between words is more direct . For example, in the above example, the verb prefer The direct object of is flight, In the dependency parsing tree , Directly prefer To flight The edge of , But this kind of relation is not direct in the component parsing （ But there are also ）.

Relationship label

The tag represents a subordinate grammatical function , The nominal label is ：

root： Central word , It's usually a verb
nsubj： Nominal subject （nominal subject）
dobj： Direct object （direct object）
prep： Preposition
pobj： Prepositional object
cc： Conjunction

Other common tags ：

compound： Compound words
advmod： adverbial
det： qualifier
amod： Adjective modifiers

2. Common methods and evaluation indicators

A rule-based approach : Early syntactic analysis methods based on dependency grammar mainly include CYK Dynamic programming algorithm 、 Methods based on constraint satisfaction and deterministic analysis strategies .
Statistical based methods ： A large number of excellent research works have also emerged in the field of statistical naturallanguageprocessing , Including generative dependency analysis 、 Discriminant dependency analysis and deterministic dependency analysis , These methods are the most representative methods in data-driven statistical dependency analysis .
Methods based on deep learning ： In recent years , Deep learning has gradually become a hot topic in syntactic analysis , The main research work focuses on feature representation . The traditional method of feature representation mainly uses the manual definition of atomic features and feature combination , Deep learning, on the other hand, characterizes atoms ( word 、 The part of speech 、 Category label ) To quantify , In the use of multi-layer neural networks to extract features .

Performance evaluation of dependency analyzer ：

Unmarked dependency accuracy （UAS）： Find the words in the test set that govern the words correctly （ Including the root node without the dominant word ） Percentage of the total number of words .
Marked dependency accuracy （LAS）： Find the words in the test set that govern the words correctly , And the dependency type is labeled with the correct word （ Including the root node without the dominant word ） Percentage of the total number of words .
Dependency accuracy （DA）： The percentage of non root words in the total number of non root words found in the test set .
Root accuracy （RA）： There are two definitions , One is the percentage between the number of correct root nodes and the number of sentences in the test set . The other refers to the percentage of sentences that find the correct root node in the total number of sentences in the test set .
Perfect match rate （CM）： The percentage of sentences with completely correct unmarked dependency structure in the test set .

3. traditional method ：arc-standard Algorithm

arc-standard To configure (configuration) Defined as c=(s, b, A).

among s It's a stack ,b It's a buffer( queue ),Ａ Is already parse The resulting edge ( Dependency , Include label). The initial configuration of s=[ROOT], b=[w1,…,wn]b=[w1,…,wn],A=ΦA=Φ. One configuration is termination (terminal) The configuration condition is ：buffer Is empty and s Only in Li ROOT.sisi From the top of the stack i Elements , therefore s1s1 It's the element at the top of the stack .bibi yes buffer Of the i Elements .arc-standard The algorithm defines 3 Kind of operation ：

LEFT-ARC(l) Go to A Add an edge to s1→s2s1→s2, The side label yes l, And then put s2s2 Remove... From the stack . This operation requirement |s|≥2|s|≥2
RIGHT-ARC(l) Go to A Add an edge to s2→s1s2→s1, The side label yes l, And then put s1s1 Remove... From the stack . This operation requirement |s|≥2|s|≥2
SHIFT hold b1b1 Move to s At the top of the . This operation requirement |b|≥1|b|≥1

For band label Version of , All in all |T|=2Nl+1|T|=2Nl+1 A different operation , here NlNl It's different label The number of . Here is an example , The upper left is the correct dependency ; The upper right is parse A configuration in the process . Below is a correct sequence of operations and configuration changes .

For greedy parser Come on , Input is a configuration , The output is |T||T| One of them . This is a question of classification , There is a lot of useful information in the input configuration , Traditional machine learning will extract many features manually , As we introduced in the previous example , We will also introduce their shortcomings , Then the next step is how to use neural network to improve the classifier .

4. Based on neural network method

For a configuration , We first extract some related words 、 Part of speech and already parse Of the relationship label. The set of words is Sw, The collection of parts of speech is St,label The set of is Sl. And assume that the sizes of these sets are nw,nt,nl. hypothesis Sw=[w1,…,wnw], We're talking about this nw Every word goes on Embedding, In this way, I get xw=[eww1,eww2,…,ewwnw], Allied , We can get xt and xl, And then we put xw,wt,wl Access to a fully connected network , And use 3 The activation function to the power of gets ：

h=(W_1^wx^w+W_1^tx^t+W_1^lx^l+b_1)^3

Then add a full connection layer , Activation is softmax, Get the probability of classification ：

p = softmax(W_2 h)

Usually, we will make a Embedding, But the author thinks that the part of speech and label Conduct Embedding It's also necessary , For example, part of speech ,NN and NNS Will have similar properties .SwSw Yes 18 Word , They are stack top and buffer The head of the 3 Word ：s1,s2,s3,b1,b2,b3;s1 and s2 Leftmost 2 A child , Rightmost 2 A child ;s1 and s2 Leftmost child's leftmost child ( This is the child's child ！), The rightmost child of the rightmost child .6+8+4=18.

For convenience , We use marks lc1(s1) Represents the top of the stack (s1) The leftmost child of the element , and lc2(s1) Display stack top (s1) The second left child of the element . Similar use rc1,rc2 Represents the rightmost and second right child . So the leftmost child of the leftmost child can say lc1(lc1(s1)).

St Yes 18 Part of speech , Is and SwSw Corresponding . and Sl Yes 12=8+4 individual , because s1,s2,s3,b1,b2,b3 did not label. We extracted label Is from and side ( children ), such as s1 And its leftmost 2 A child will correspond to two sides . therefore label A total of 8+4=12 individual .