当前位置：网站首页>NLP task summary introduction and understanding

NLP task summary introduction and understanding

2022-06-24 03:27:00 【Goose】

1. background

NLP The four tasks are as follows ：

Sequence tagging tasks
Classification task
Sentence relation judgment
Generative task

2. Sequence tagging tasks

Sequence annotation （Sequence labeling） We're solving it NLP Problem is one of the basic problems that we often encounter . In the sequence annotation , We want to label each element of a sequence . Generally speaking , A sequence refers to a sentence , An element refers to a word in a sentence . For example, the problem of information extraction can be regarded as a sequence annotation problem , If you mention meeting time 、 Location, etc .

Sequence annotation can generally be divided into two categories ：

Original annotation （Raw labeling）： Each element needs to be labeled as a label .
Joint annotation （Joint segmentation and labeling）： All segments are labeled with the same label .

Named entity recognition （Named entity recognition, NER） It is a sub task of information extraction problem , The elements need to be located and classified , Like a person's name 、 Organization name 、 place 、 Time 、 Quality, etc .

Take up a NER And joint annotation . A sentence is ：Yesterday , George Bush gave a speech. It includes a named entity ：George Bush. We want to label “ The person's name ” Mark the whole phrase “George Bush” in , Instead of marking two words separately . This is the union label .

2.1 BIO mark

The simplest way to solve the problem of joint annotation , Is to transform it into the original annotation problem . The standard practice is to use BIO mark .

BIO mark ： Label each element as “B-X”、“I-X” perhaps “O”. among ,“B-X” Indicates that the fragment of this element belongs to X Type and this element at the beginning of this fragment ,“I-X” Indicates that the fragment of this element belongs to X Type and this element is in the middle of this fragment ,“O” Indicates that it does not belong to any type .

such as , We will X Expressed as a noun phrase （Noun Phrase, NP）, be BIO The three marks of are ：

B-NP： The beginning of a noun phrase ;
I-NP： The middle of a noun phrase ;
O： Not a noun phrase ;

Therefore, a paragraph can be divided into the following results ：

We can go further BIO Applied to the NER in , To define all named entities （ The person's name 、 Organization name 、 place 、 Time and so on ）, Then we will have many B and I Categories , Such as B-PERS、I-PERS、B-ORG、I-ORG etc. . Then you can get the following results ：

2.2 Common models for sequence annotation

Bi-LSTM： Select bidirectional LSTM The reason is that ： The current term tag It is related to both before and after .
Bi-LSTM+CRF: https://zhuanlan.zhihu.com/p/169719001

2.3 Sequence annotation specific tasks

（1） participle

Input ：word + tag（I：in word;E：end of word）;
Output ：tag of word, The label is E Add a space after , The purpose of participle is achieved ;

（2） Part of speech tagging （Part-of-Speech tagging ,POS tagging)

Input ：word + tag （ The part of speech ： Verb 、 Noun 、 Adjectives, etc ）;
Output ： The part of speech ;
Model ：HMM You can also do

（3） Named entity annotation （name entity recognition, NER）

Input ：word + tag（B: begin of entity,I : inside of entity,o: outside of entity）;
Output ： Entity Tagging ;

（4） Word meaning role tagging (semantic role labeling, SRL) ：

Input ： word + Is it the predicate （B-Argo,I-Argo,BV ）;
Output ： semantic role ;

2. Classification task

2.1 Specific tasks of classification

（1） Text classification 、 Emotional categories

Model ：LSTM, Belong to many- to - one The problem of , Finally using Softmax Output classification results ;

3. Sentence relation judgment

3.1 Specific tasks

Syntactic parsing 、 Implication relation judgment （entailment）

Model ： Parsing tree ,LSTM Come to each edges It's a score , Choose the one with the highest score edges, These are the limitations edges Must form a tree ;
Model ：RNNGs You can also do

4. Generative task

Such tasks are generally directed to ordinary users , System level tasks that provide naturallanguageprocessing product services , Many levels of naturallanguageprocessing technology will be used .

（1） Machine translation （Machine Translation,MT）

Encoder-Decoder The most classic application of , In fact, this structure was first proposed in the field of machine translation .

（2） Text in this paper, 、 summary （Text summarization/Simplication）

The input is a sequence of text , The output is a summary sequence of this text sequence .

（3） reading comprehension （Reading Comprehension）

Code the input article and question separately , Then decode it to get the answer to the question .

（4） speech recognition

The input is a sequence of speech signals , The output is a sequence of words .

（5） Dialogue system （Dialogue Systerm）

The input is a sentence , The output is the answer to this sentence .

（6） Question answering system （Question-Answering Systerm）

For the questions raised by users , The system gives the corresponding answer .

（7） Automatic article grading （Automatic Essay Grading）

Given an article , Grade or grade the quality of the article .

5. Other classifications

NLP Basic tasks ：

1. Lexical analysis （Lexical Analysis）： Lexical analysis of natural language , yes NLP Basic work

participle （Word Segmentation/Tokenization）： Segment text without obvious boundaries , Get the word sequence
The discovery of new words （New Words Identification）： Find out the new situation in the text 、 A word with a new meaning or usage
Morphological analysis （Morphological Analysis）： Analyze the morphological composition of words , Including stem words （Sterms）、 Root （Roots）、 affix （Prefixes and Suffixes） etc.
Part of speech tagging （Part-of-speech Tagging）： Determine the part of speech of each word in the text . Part of speech includes verbs （Verb）、 Noun （Noun）、 pronouns （pronoun） etc.
Spelling correction （Spelling Correction）： Find out the misspelled words and correct them

2. Sentence analysis （Sentence Analysis）： Sentence level analysis of natural language , Including parsing and other sentence level parsing tasks

Block analysis （Chunking）： Mark the phrase block in the sentence , For example, noun phrases （NP）, Verb phrases （VP） etc.
Super label labels （Super Tagging）： Label each word in each sentence with a super tag , A super tag is a tree structure related to the word in the syntax tree
Component parsing （Constituency Parsing）： Analyze the components of a sentence , A syntax tree consisting of terminator and non terminator is given
dependency parsing （Dependency Parsing）： Analyze the dependency between words in sentences , Give a dependency syntax tree composed of word dependencies
Language model （Language Modeling）： Grade a given sentence , This score represents sentence rationality （ Fluency ） The degree of
Language recognition （Language Identification）： Given a piece of text , Determine which language the text belongs to
Sentence boundary detection （Sentence Boundary Detection）： Add a boundary to the text without obvious sentence boundary

3. Semantic analysis （Semantic Analysis）： Analyze and understand the given text , Form a formal or distributed representation that can express semantics

Word sense disambiguation （Word Sense Disambiguation）： For ambiguous words , Determine the exact meaning of the word
Semantic Role Labeling （Semantic Role Labeling）： Mark the semantic role class in the sentence , semantic role , Semantic roles include agents 、 Patient 、 Influence, etc
Abstract semantic representation analysis （Abstract Meaning Representation Parsing）：AMR Is an abstract semantic representation ,AMR parser Parse the sentence into AMR structure
First order predicate logic calculus （First Order Predicate Calculus）： First order predicate logic system is used to express semantics
Framework semantic analysis （Frame Semantic Parsing）： According to frame semantics , Semantic analysis of sentences
vocabulary / The sentence / Vectorized representation of paragraphs （Word/Sentence/Paragraph Vector）： Research vocabulary 、 The sentence 、 Vectorization of paragraphs , Properties and applications of vectors

4. Information extraction （Information Extraction）： Extract structured information from unstructured text

Named entity recognition （Named Entity Recognition）： Identify named entities from text , An entity generally includes a person's name 、 Place names 、 Organization name 、 Time 、 date 、 currency 、 Percentage, etc
Entity disambiguation （Entity Disambiguation）： Identify the real-world objects that entities refer to
Term extraction （Terminology/Giossary Extraction）： Identify terms from text
Co refers to digestion （Coreference Resolution）： Determine the equivalent description of different entities , Including pronoun resolution and noun resolution
Relationship extraction （Relationship Extraction）： Determine the type of relationship between two entities in the text
Event extraction （Event Extraction）： Extract structured events from unstructured text
Sentiment analysis （Sentiment Analysis）： Extract the subjective emotion of the text
Intention recognition （Intent Detection）： An important module in the dialogue system , Analyze the dialogue content given by the user , Identify user intent
Slot filling （Slot Filling）： An important module in the dialogue system , Analyze the effective information related to the user's intention from the conversation content

5. Top level task （High-level Tasks）： Direct to ordinary users , System level tasks that provide naturallanguageprocessing product services , Many levels of naturallanguageprocessing technology will be used

Machine translation （Machine Translation）： Automatic translation of one language into another by computer
Text in this paper, （Text summarization/Simplication）： Extract the content outline of long text
Question answering system （Question-Answering Systerm）： For the questions raised by users , The system gives the corresponding answer
Dialogue system （Dialogue Systerm）： Be able to chat with users , Capture the user's intent from the conversation , And analyze and execute
reading comprehension （Reading Comprehension）： After the machine reads an article , Given some article related questions , The machine can answer
Automatic article grading （Automatic Essay Grading）： Given an article , Grade or grade the quality of the article

原网站

版权声明
本文为[Goose]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/10/20211006214536105v.html