当前位置:网站首页>NLP task summary introduction and understanding

NLP task summary introduction and understanding

2022-06-24 03:27:00 Goose

1. background

NLP The four tasks are as follows :

  1. Sequence tagging tasks
  2. Classification task
  3. Sentence relation judgment
  4. Generative task

2. Sequence tagging tasks

Sequence annotation (Sequence labeling) We're solving it NLP Problem is one of the basic problems that we often encounter . In the sequence annotation , We want to label each element of a sequence . Generally speaking , A sequence refers to a sentence , An element refers to a word in a sentence . For example, the problem of information extraction can be regarded as a sequence annotation problem , If you mention meeting time 、 Location, etc .

Sequence annotation can generally be divided into two categories :

  • Original annotation (Raw labeling): Each element needs to be labeled as a label .
  • Joint annotation (Joint segmentation and labeling): All segments are labeled with the same label .

Named entity recognition (Named entity recognition, NER) It is a sub task of information extraction problem , The elements need to be located and classified , Like a person's name 、 Organization name 、 place 、 Time 、 Quality, etc .

Take up a NER And joint annotation . A sentence is :Yesterday , George Bush gave a speech. It includes a named entity :George Bush. We want to label “ The person's name ” Mark the whole phrase “George Bush” in , Instead of marking two words separately . This is the union label .

2.1 BIO mark

The simplest way to solve the problem of joint annotation , Is to transform it into the original annotation problem . The standard practice is to use BIO mark .

BIO mark : Label each element as “B-X”、“I-X” perhaps “O”. among ,“B-X” Indicates that the fragment of this element belongs to X Type and this element at the beginning of this fragment ,“I-X” Indicates that the fragment of this element belongs to X Type and this element is in the middle of this fragment ,“O” Indicates that it does not belong to any type .

such as , We will X Expressed as a noun phrase (Noun Phrase, NP), be BIO The three marks of are :

  • B-NP: The beginning of a noun phrase ;
  • I-NP: The middle of a noun phrase ;
  • O: Not a noun phrase ;

Therefore, a paragraph can be divided into the following results :

We can go further BIO Applied to the NER in , To define all named entities ( The person's name 、 Organization name 、 place 、 Time and so on ), Then we will have many B and I Categories , Such as B-PERS、I-PERS、B-ORG、I-ORG etc. . Then you can get the following results :

2.2 Common models for sequence annotation

  • Bi-LSTM: Select bidirectional LSTM The reason is that : The current term tag It is related to both before and after .
  • Bi-LSTM+CRF: https://zhuanlan.zhihu.com/p/169719001

2.3 Sequence annotation specific tasks

(1) participle

  • Input :word + tag(I:in word;E:end of word);
  • Output :tag of word, The label is E Add a space after , The purpose of participle is achieved ;

(2) Part of speech tagging (Part-of-Speech tagging ,POS tagging)

  • Input :word + tag ( The part of speech : Verb 、 Noun 、 Adjectives, etc );
  • Output : The part of speech ;
  • Model :HMM You can also do

(3) Named entity annotation (name entity recognition, NER)

  • Input :word + tag(B: begin of entity,I : inside of entity,o: outside of entity);
  • Output : Entity Tagging ;

(4) Word meaning role tagging (semantic role labeling, SRL) :

  • Input : word + Is it the predicate (B-Argo,I-Argo,BV );
  • Output : semantic role ;

2. Classification task

2.1 Specific tasks of classification

(1) Text classification 、 Emotional categories

  • Model :LSTM, Belong to many- to - one The problem of , Finally using Softmax Output classification results ;

3. Sentence relation judgment

3.1 Specific tasks

Syntactic parsing 、 Implication relation judgment (entailment)

  • Model : Parsing tree ,LSTM Come to each edges It's a score , Choose the one with the highest score edges, These are the limitations edges Must form a tree ;
  • Model :RNNGs You can also do

4. Generative task

Such tasks are generally directed to ordinary users , System level tasks that provide naturallanguageprocessing product services , Many levels of naturallanguageprocessing technology will be used .

(1) Machine translation (Machine Translation,MT)

Encoder-Decoder The most classic application of , In fact, this structure was first proposed in the field of machine translation .

(2) Text in this paper, 、 summary (Text summarization/Simplication)

The input is a sequence of text , The output is a summary sequence of this text sequence .

(3) reading comprehension (Reading Comprehension)

Code the input article and question separately , Then decode it to get the answer to the question .

(4) speech recognition

The input is a sequence of speech signals , The output is a sequence of words .

(5) Dialogue system (Dialogue Systerm)

The input is a sentence , The output is the answer to this sentence .

(6) Question answering system (Question-Answering Systerm)

For the questions raised by users , The system gives the corresponding answer .

(7) Automatic article grading (Automatic Essay Grading)

Given an article , Grade or grade the quality of the article .

5. Other classifications

NLP Basic tasks :

1. Lexical analysis (Lexical Analysis): Lexical analysis of natural language , yes NLP Basic work

  • participle (Word Segmentation/Tokenization): Segment text without obvious boundaries , Get the word sequence
  • The discovery of new words (New Words Identification): Find out the new situation in the text 、 A word with a new meaning or usage
  • Morphological analysis (Morphological Analysis): Analyze the morphological composition of words , Including stem words (Sterms)、 Root (Roots)、 affix (Prefixes and Suffixes) etc.
  • Part of speech tagging (Part-of-speech Tagging): Determine the part of speech of each word in the text . Part of speech includes verbs (Verb)、 Noun (Noun)、 pronouns (pronoun) etc.
  • Spelling correction (Spelling Correction): Find out the misspelled words and correct them

2. Sentence analysis (Sentence Analysis): Sentence level analysis of natural language , Including parsing and other sentence level parsing tasks

  • Block analysis (Chunking): Mark the phrase block in the sentence , For example, noun phrases (NP), Verb phrases (VP) etc.
  • Super label labels (Super Tagging): Label each word in each sentence with a super tag , A super tag is a tree structure related to the word in the syntax tree
  • Component parsing (Constituency Parsing): Analyze the components of a sentence , A syntax tree consisting of terminator and non terminator is given
  • dependency parsing (Dependency Parsing): Analyze the dependency between words in sentences , Give a dependency syntax tree composed of word dependencies
  • Language model (Language Modeling): Grade a given sentence , This score represents sentence rationality ( Fluency ) The degree of
  • Language recognition (Language Identification): Given a piece of text , Determine which language the text belongs to
  • Sentence boundary detection (Sentence Boundary Detection): Add a boundary to the text without obvious sentence boundary

3. Semantic analysis (Semantic Analysis): Analyze and understand the given text , Form a formal or distributed representation that can express semantics

  • Word sense disambiguation (Word Sense Disambiguation): For ambiguous words , Determine the exact meaning of the word
  • Semantic Role Labeling (Semantic Role Labeling): Mark the semantic role class in the sentence , semantic role , Semantic roles include agents 、 Patient 、 Influence, etc
  • Abstract semantic representation analysis (Abstract Meaning Representation Parsing):AMR Is an abstract semantic representation ,AMR parser Parse the sentence into AMR structure
  • First order predicate logic calculus (First Order Predicate Calculus): First order predicate logic system is used to express semantics
  • Framework semantic analysis (Frame Semantic Parsing): According to frame semantics , Semantic analysis of sentences
  • vocabulary / The sentence / Vectorized representation of paragraphs (Word/Sentence/Paragraph Vector): Research vocabulary 、 The sentence 、 Vectorization of paragraphs , Properties and applications of vectors

4. Information extraction (Information Extraction): Extract structured information from unstructured text

  • Named entity recognition (Named Entity Recognition): Identify named entities from text , An entity generally includes a person's name 、 Place names 、 Organization name 、 Time 、 date 、 currency 、 Percentage, etc
  • Entity disambiguation (Entity Disambiguation): Identify the real-world objects that entities refer to
  • Term extraction (Terminology/Giossary Extraction): Identify terms from text
  • Co refers to digestion (Coreference Resolution): Determine the equivalent description of different entities , Including pronoun resolution and noun resolution
  • Relationship extraction (Relationship Extraction): Determine the type of relationship between two entities in the text
  • Event extraction (Event Extraction): Extract structured events from unstructured text
  • Sentiment analysis (Sentiment Analysis): Extract the subjective emotion of the text
  • Intention recognition (Intent Detection): An important module in the dialogue system , Analyze the dialogue content given by the user , Identify user intent
  • Slot filling (Slot Filling): An important module in the dialogue system , Analyze the effective information related to the user's intention from the conversation content

5. Top level task (High-level Tasks): Direct to ordinary users , System level tasks that provide naturallanguageprocessing product services , Many levels of naturallanguageprocessing technology will be used

  • Machine translation (Machine Translation): Automatic translation of one language into another by computer
  • Text in this paper, (Text summarization/Simplication): Extract the content outline of long text
  • Question answering system (Question-Answering Systerm): For the questions raised by users , The system gives the corresponding answer
  • Dialogue system (Dialogue Systerm): Be able to chat with users , Capture the user's intent from the conversation , And analyze and execute
  • reading comprehension (Reading Comprehension): After the machine reads an article , Given some article related questions , The machine can answer
  • Automatic article grading (Automatic Essay Grading): Given an article , Grade or grade the quality of the article
原网站

版权声明
本文为[Goose]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/10/20211006214536105v.html