当前位置:网站首页>NLP task summary introduction and understanding
NLP task summary introduction and understanding
2022-06-24 03:27:00 【Goose】
1. background
NLP The four tasks are as follows :
- Sequence tagging tasks
- Classification task
- Sentence relation judgment
- Generative task
2. Sequence tagging tasks
Sequence annotation (Sequence labeling) We're solving it NLP Problem is one of the basic problems that we often encounter . In the sequence annotation , We want to label each element of a sequence . Generally speaking , A sequence refers to a sentence , An element refers to a word in a sentence . For example, the problem of information extraction can be regarded as a sequence annotation problem , If you mention meeting time 、 Location, etc .
Sequence annotation can generally be divided into two categories :
- Original annotation (Raw labeling): Each element needs to be labeled as a label .
- Joint annotation (Joint segmentation and labeling): All segments are labeled with the same label .
Named entity recognition (Named entity recognition, NER) It is a sub task of information extraction problem , The elements need to be located and classified , Like a person's name 、 Organization name 、 place 、 Time 、 Quality, etc .
Take up a NER And joint annotation . A sentence is :Yesterday , George Bush gave a speech. It includes a named entity :George Bush. We want to label “ The person's name ” Mark the whole phrase “George Bush” in , Instead of marking two words separately . This is the union label .
2.1 BIO mark
The simplest way to solve the problem of joint annotation , Is to transform it into the original annotation problem . The standard practice is to use BIO mark .
BIO mark : Label each element as “B-X”、“I-X” perhaps “O”. among ,“B-X” Indicates that the fragment of this element belongs to X Type and this element at the beginning of this fragment ,“I-X” Indicates that the fragment of this element belongs to X Type and this element is in the middle of this fragment ,“O” Indicates that it does not belong to any type .
such as , We will X Expressed as a noun phrase (Noun Phrase, NP), be BIO The three marks of are :
- B-NP: The beginning of a noun phrase ;
- I-NP: The middle of a noun phrase ;
- O: Not a noun phrase ;
Therefore, a paragraph can be divided into the following results :
We can go further BIO Applied to the NER in , To define all named entities ( The person's name 、 Organization name 、 place 、 Time and so on ), Then we will have many B and I Categories , Such as B-PERS、I-PERS、B-ORG、I-ORG etc. . Then you can get the following results :
2.2 Common models for sequence annotation
- Bi-LSTM: Select bidirectional LSTM The reason is that : The current term tag It is related to both before and after .
- Bi-LSTM+CRF: https://zhuanlan.zhihu.com/p/169719001
2.3 Sequence annotation specific tasks
(1) participle
- Input :word + tag(I:in word;E:end of word);
- Output :tag of word, The label is E Add a space after , The purpose of participle is achieved ;
(2) Part of speech tagging (Part-of-Speech tagging ,POS tagging)
- Input :word + tag ( The part of speech : Verb 、 Noun 、 Adjectives, etc );
- Output : The part of speech ;
- Model :HMM You can also do
(3) Named entity annotation (name entity recognition, NER)
- Input :word + tag(B: begin of entity,I : inside of entity,o: outside of entity);
- Output : Entity Tagging ;
(4) Word meaning role tagging (semantic role labeling, SRL) :
- Input : word + Is it the predicate (B-Argo,I-Argo,BV );
- Output : semantic role ;
2. Classification task
2.1 Specific tasks of classification
(1) Text classification 、 Emotional categories
- Model :LSTM, Belong to many- to - one The problem of , Finally using Softmax Output classification results ;
3. Sentence relation judgment
3.1 Specific tasks
Syntactic parsing 、 Implication relation judgment (entailment)
- Model : Parsing tree ,LSTM Come to each edges It's a score , Choose the one with the highest score edges, These are the limitations edges Must form a tree ;
- Model :RNNGs You can also do
4. Generative task
Such tasks are generally directed to ordinary users , System level tasks that provide naturallanguageprocessing product services , Many levels of naturallanguageprocessing technology will be used .
(1) Machine translation (Machine Translation,MT)
Encoder-Decoder The most classic application of , In fact, this structure was first proposed in the field of machine translation .
(2) Text in this paper, 、 summary (Text summarization/Simplication)
The input is a sequence of text , The output is a summary sequence of this text sequence .
(3) reading comprehension (Reading Comprehension)
Code the input article and question separately , Then decode it to get the answer to the question .
(4) speech recognition
The input is a sequence of speech signals , The output is a sequence of words .
(5) Dialogue system (Dialogue Systerm)
The input is a sentence , The output is the answer to this sentence .
(6) Question answering system (Question-Answering Systerm)
For the questions raised by users , The system gives the corresponding answer .
(7) Automatic article grading (Automatic Essay Grading)
Given an article , Grade or grade the quality of the article .
5. Other classifications
NLP Basic tasks :
1. Lexical analysis (Lexical Analysis): Lexical analysis of natural language , yes NLP Basic work
- participle (Word Segmentation/Tokenization): Segment text without obvious boundaries , Get the word sequence
- The discovery of new words (New Words Identification): Find out the new situation in the text 、 A word with a new meaning or usage
- Morphological analysis (Morphological Analysis): Analyze the morphological composition of words , Including stem words (Sterms)、 Root (Roots)、 affix (Prefixes and Suffixes) etc.
- Part of speech tagging (Part-of-speech Tagging): Determine the part of speech of each word in the text . Part of speech includes verbs (Verb)、 Noun (Noun)、 pronouns (pronoun) etc.
- Spelling correction (Spelling Correction): Find out the misspelled words and correct them
2. Sentence analysis (Sentence Analysis): Sentence level analysis of natural language , Including parsing and other sentence level parsing tasks
- Block analysis (Chunking): Mark the phrase block in the sentence , For example, noun phrases (NP), Verb phrases (VP) etc.
- Super label labels (Super Tagging): Label each word in each sentence with a super tag , A super tag is a tree structure related to the word in the syntax tree
- Component parsing (Constituency Parsing): Analyze the components of a sentence , A syntax tree consisting of terminator and non terminator is given
- dependency parsing (Dependency Parsing): Analyze the dependency between words in sentences , Give a dependency syntax tree composed of word dependencies
- Language model (Language Modeling): Grade a given sentence , This score represents sentence rationality ( Fluency ) The degree of
- Language recognition (Language Identification): Given a piece of text , Determine which language the text belongs to
- Sentence boundary detection (Sentence Boundary Detection): Add a boundary to the text without obvious sentence boundary
3. Semantic analysis (Semantic Analysis): Analyze and understand the given text , Form a formal or distributed representation that can express semantics
- Word sense disambiguation (Word Sense Disambiguation): For ambiguous words , Determine the exact meaning of the word
- Semantic Role Labeling (Semantic Role Labeling): Mark the semantic role class in the sentence , semantic role , Semantic roles include agents 、 Patient 、 Influence, etc
- Abstract semantic representation analysis (Abstract Meaning Representation Parsing):AMR Is an abstract semantic representation ,AMR parser Parse the sentence into AMR structure
- First order predicate logic calculus (First Order Predicate Calculus): First order predicate logic system is used to express semantics
- Framework semantic analysis (Frame Semantic Parsing): According to frame semantics , Semantic analysis of sentences
- vocabulary / The sentence / Vectorized representation of paragraphs (Word/Sentence/Paragraph Vector): Research vocabulary 、 The sentence 、 Vectorization of paragraphs , Properties and applications of vectors
4. Information extraction (Information Extraction): Extract structured information from unstructured text
- Named entity recognition (Named Entity Recognition): Identify named entities from text , An entity generally includes a person's name 、 Place names 、 Organization name 、 Time 、 date 、 currency 、 Percentage, etc
- Entity disambiguation (Entity Disambiguation): Identify the real-world objects that entities refer to
- Term extraction (Terminology/Giossary Extraction): Identify terms from text
- Co refers to digestion (Coreference Resolution): Determine the equivalent description of different entities , Including pronoun resolution and noun resolution
- Relationship extraction (Relationship Extraction): Determine the type of relationship between two entities in the text
- Event extraction (Event Extraction): Extract structured events from unstructured text
- Sentiment analysis (Sentiment Analysis): Extract the subjective emotion of the text
- Intention recognition (Intent Detection): An important module in the dialogue system , Analyze the dialogue content given by the user , Identify user intent
- Slot filling (Slot Filling): An important module in the dialogue system , Analyze the effective information related to the user's intention from the conversation content
5. Top level task (High-level Tasks): Direct to ordinary users , System level tasks that provide naturallanguageprocessing product services , Many levels of naturallanguageprocessing technology will be used
- Machine translation (Machine Translation): Automatic translation of one language into another by computer
- Text in this paper, (Text summarization/Simplication): Extract the content outline of long text
- Question answering system (Question-Answering Systerm): For the questions raised by users , The system gives the corresponding answer
- Dialogue system (Dialogue Systerm): Be able to chat with users , Capture the user's intent from the conversation , And analyze and execute
- reading comprehension (Reading Comprehension): After the machine reads an article , Given some article related questions , The machine can answer
- Automatic article grading (Automatic Essay Grading): Given an article , Grade or grade the quality of the article
边栏推荐
- Go program lifecycle
- What is the GPU usage for cloud desktops and servers? What can cloud desktop do?
- What is the role of the distributed configuration center? What are the advantages of a distributed configuration center?
- How to select a server with appropriate configuration when planning to build a live broadcast platform
- Tencent cloud CVM starts IPv6
- If the cloud knows that security is important
- Gigabyte was attacked by blackmail software, and the FBI banned the hacker organization Revil | global network security hotspot
- Coding Ci of Devops
- What protocols do fortress computers have and what protocols do fortress computers generally use
- How to build glasses website what are the functions of glasses website construction
猜你喜欢
![[summary of interview questions] zj5](/img/d8/ece82f8b2479adb948ba706f6f5039.jpg)
[summary of interview questions] zj5

QT creator tips

Get to know MySQL database
![[summary of interview questions] zj6 redis](/img/4b/eadf66ca8d834f049f3546d348fa32.jpg)
[summary of interview questions] zj6 redis

Ar 3D map technology

Community pycharm installation visual database

Sorting out of key vulnerabilities identified by CMS in the peripheral management of red team (I)

Simple and beautiful weather code

On Sunday, I rolled up the uni app "uview excellent UI framework"
随机推荐
Understand Devops from the perspective of leader
Elk7.15.1 installation, deployment and construction
What aspects does the intelligent identification system include? Is the technology of intelligent identification system mature now?
What are the functions of Fortress machine equipment
Summary of common problems of real-time audio and video TRTC - quality
11111dasfada and I grew the problem hot hot I hot vasser shares
On Sunday, I rolled up the uni app "uview excellent UI framework"
Storage crash MySQL database recovery case
Innovation or hype? Is low code a real artifact or a fake tuyere?
Community pycharm installation visual database
How to apply for trademark registration? What are the steps?
Windowsvpn client is coveted by vulnerabilities, 53% of companies face supply chain attacks | global network security hotspot
RI Geng series: tricks of using function pointers
How to install the cloud desktop security server certificate? What can cloud desktops do?
How to use elastic scaling in cloud computing? What are the functions?
What does elastic scaling of cloud computing mean? What are the application scenarios for elastic scaling of cloud computing?
System design: File Hosting Service
Introduce the comparison of various distributed configuration centers? Which distributed configuration center is better?
How to register a trademark? What needs to be prepared?
How does the fortress machine connect to the server? Which is easy to use, fortress machine or firewall?