当前位置：网站首页>CMU puts forward a new NLP paradigm - reconstructing pre training, and achieving 134 high scores in college entrance examination English

CMU puts forward a new NLP paradigm - reconstructing pre training, and achieving 134 high scores in college entrance examination English

2022-06-28 02:56:00 【Zhiyuan community】

What this article puts forward Reconstruction pre training （reStructured Pre-training,RST）, Not only in various NLP Perform brilliantly on the task , In college entrance examination English , Also handed over a satisfactory result .

The way we store data is changing , From biological neural network to artificial neural network , In fact, the most common case is to use the brain to store data . With the growing amount of data available today , People are looking for different external devices to store data , Such as hard disk drive or cloud storage . With the rise of deep learning technology , Another promising storage technology has emerged , It uses artificial neural networks to store information in data .

Researchers believe , The ultimate goal of data storage is to better serve human life , The way data is accessed is as important as the way it is stored . However , There are differences in how data is stored and accessed . In the history of , People have been trying to bridge this gap , In order to make better use of the information that exists in the world . Pictured 3 Shown ：

In biological neural networks （ Such as human brain ） aspect , Human beings are taught at a very young age （ Knowledge ） education , So that they can extract specific data to deal with the complex and changeable life .

For external device storage , People usually follow a certain pattern （ For example, form ） Structure the data , Then use a special language （ for example SQL） Effectively retrieve the required information from the database .

For storage based on artificial neural network , Researchers use self supervised learning to store data from large corpora （ Pre training ）, The network is then used for various downstream tasks （ For example, emotional classification ）.

come from CMU A new method for accessing data containing various types of information has been proposed by the researchers of , This information can be used as a pre training signal to guide the model to optimize parameters . The research shows the data structurally in the unit of signal . This is similar to the scenario of using a database to store data ： First construct them into tables or JSON Format , In this way, we can use special language ( Such as SQL) Accurately retrieve the information you need .

Besides , This study believes that valuable signals are abundant in all kinds of data in the world , Rather than simply exist in manually managed supervisory data sets , What researchers need to do is (a) Identifying data (b) Reorganize data in a unified language （c） Integrate and store them in the pre training language model . This study calls this learning paradigm "reconstructive pre training" （reStructured Pre-training,RST）. The researchers liken this process to 「 Mine treasure hunt 」. Different data sources like Wikipedia , It is equivalent to a mine rich in precious stones . They contain a wealth of information , For example, named entities from hyperlinks , It can provide signals for model pre training . A good pre training model (PLM) The composition of various signals in the data should be clearly understood , In order to provide accurate information according to the different needs of downstream tasks .

Paper title ：

reStructured Pre-training

Thesis link ：

https://arxiv.org/pdf/2206.11147.pdf

▲ Pre training language model Treasure Hunt

This study proposes a new paradigm of Task-based Learning in naturallanguageprocessing , namely RST, This paradigm re emphasizes the role of data , Model pre training and fine-tuning of downstream tasks are regarded as data storage and access processes . On this basis , This study implements a simple principle , That is, a good storage mechanism should not only have the ability to cache a large amount of data , Ease of access should also be considered .

After overcoming some engineering challenges , This research is based on the reconstruction of data （ It consists of all kinds of valuable information rather than raw data ） Pre training to achieve this . Experimental proof ,RST Models are not only coming from various NLP Mission （ For example, classification 、 Information extraction 、 Fact retrieval 、 Text generation, etc ） Of 52/55 Performance on popular datasets significantly exceeds that of the best existing systems （ for example ,T0）, And there is no need to fine tune downstream tasks . Every year, millions of students take part in the most authoritative college entrance examination in China, and have also achieved excellent results .

To be specific , The college entrance examination AI (Qin) Higher than the student's average score 40 branch , Than using 1/16 Parametric GPT3 Higher than 15 branch . Special Qin stay 2018 In the English test, I got 138.5 The high score （ Full marks 150）.

Besides , The study also released the college entrance examination benchmark （Gaokao Benchmark） Online submission platform , contain 2018-2021 So far this year 10 An annotated English test paper （ And will be expanded every year ）, Make more AI The model takes part in the college entrance examination , The study also established a relatively fair human and AI Competitive test platforms , Help us better understand where we are . in addition , In a few days ago （2022.06.08） Of 2022 College entrance examination English test , The AI The system obtains 134 Good grades , and GPT3 Only got 108 branch .

The main contributions of this study include ：

1. carry Out NLP The evolutionary hypothesis of the method . This study attempts to explore modern NLP The inner link between technological development , From the overall point of view 「NLP Technological evolution hypothesis 」. In short , The core idea of this hypothesis is ： The iteration of technology always develops in this direction ： That is, developers need to do less to design better 、 A more general-purpose system .

up to now ,NLP The technology evolution has gone through the following process 2 Multiple iterations shown ： Feature Engineering → Architecture Engineering → Target project →prompt engineering , Is moving towards a more practical and effective data centric project . The researchers hope that more researchers will be inspired to think critically about this problem in the future , Grasp the core driving force of technological progress , Find academic development 「 The gradient rises 」 route , Do more work of scientific significance .

2. New paradigm based on evolutionary hypothesis ： Reconstruction pre training （reStructured Pre-training）. This paradigm pre trains the model / Fine tuning is treated as data storage / Access to the process , And claimed that a good storage mechanism should make the expected data easy to access . With such a new paradigm , The research can be carried out from 10 Data sources （ for example Wikipedia） China unifies the world 26 There are different types of signals （ For example, the entity of a sentence ）. The general model trained on this basis has achieved strong generalization ability in various tasks , These include 55 individual NLP Data set of .

3. For the college entrance examination AI. Based on the above paradigm , This study developed a special English test for college entrance examination AI System ——Qin. This is the world's first college entrance examination English artificial intelligence system based on deep learning .Qin He has made outstanding achievements in the college entrance examination for many years ： Higher than the average person 40 branch , Just use GPT-3 1/16 The parameter quantity of is obtained GPT-3 high 15 Points of . Especially in 2018 On the English test questions of ,QIN To obtain the 138.5 branch （ Full marks 150 branch ） The high score , Full marks for listening and reading comprehension .

4. Rich resources .(1) To track existing AI The progress of technology in realizing human intelligence , The study released a new benchmark ——Gaokao Benchmark. It not only provides a comprehensive assessment of various practical tasks and fields in real-world scenarios , It can also provide human performance , So that the artificial intelligence system can be directly compared with human beings .（2） The study uses ExplainaBoard（Liu et al., 2021b） by Gaokao Benchmark Set up an interactive leaderboard , So that more AI The system can easily participate in Gaokao Benchmark And automatically get scores .（3） All resources can be found in GitHub Found on the .

Besides ,AI The success in the college entrance examination English test task has provided researchers with a lot of new ideas ：AI Technology can empower education , Help solve a series of problems in education and teaching .

for example ,(a) Help teachers grade automatically ,(b) Help students answer questions about homework and explain in detail , as well as (c) what's more , Promote equity in Education , Let most families have access to education services of the same quality . For the first time, this work has integrated the world in a unified way 26 A different signal , Instead of trying to distinguish between supervised and unsupervised data , It's about how much we can use the information nature gives us and how we can use it . From all kinds NLP Mission 50 The excellent performance of multiple datasets shows the value of data centric pre training , And inspired more future exploration .

Reconstruction pre training

solve NLP The paradigm of tasks is changing rapidly , And it continues , The following table lists them NLP Five paradigms in ：

It is different from the existing model centered design paradigm , The study is more data oriented , To maximize the use of existing data . say concretely , The research adopts data storage and access view , The pre training stage is regarded as a data storage procedure , The downstream tasks based on the pre training model （ for example , Emotional categories ） It is considered a data access process from the pre training model , And claimed that a good data storage mechanism should make the stored data more accessible .

In order to achieve this goal , This study treats data as objects composed of different signals , A good pre training model should （1） Cover as many signal types as possible ,（2） When required by downstream tasks , Provide a precise access mechanism for these signals . Generally speaking , This new paradigm consists of three steps ： restructure 、 Preliminary training 、 fine-tuning .

restructure 、 Preliminary training 、 The new paradigm of fine-tuning highlights the importance of data , Researchers need to invest more engineering energy in data processing .

Reconstruction project

2.1 Signal definition

Signals are useful information in data , It can provide supervision for machine learning model , Expressed as n Tuples . for example 「 Mozart was born in Salzburg 」,「 Mozart 」、「 Salzburg 」 It can be considered as a signal of named entity recognition . Usually , Signals can be clustered from different angles , Here's the picture 6 Shown .

2.2 data mining

Real world data contains many different types of signals . Reconstruction pre training enables these signals to be fully utilized . The study will collect signals （n Tuples ） Organized in a tree diagram , Here's the picture 10 Shown .

2.3 Signal extraction

In the next step, the research carries out signal extraction and processing , It involves getting raw data from data mining of different modes 、 Data cleaning and data normalization . The existing methods can be roughly divided into two types ：（1） Based on rules ,（2） Machine learning based . In this work , This research mainly focuses on the rule-based signal extraction strategy , And leave more high coverage methods for future work .

2.4 Signal reconstruction

After extracting different signals from various data mining , The next important step is to unify them into a fixed form , In order to store all the information in the model consistently during the pre training .prompt Method （Brown et al., 2020; Liu et al., 2021d） It can be achieved , In principle, , Through appropriate prompt Design , It can unify almost all types of signals into one language model style .

The study divides signals into two broad categories ： General signals and task related signals . The former contains basic language knowledge , It can benefit all downstream tasks to some extent , The latter can benefit certain downstream tasks .

stay 55 Kind of commonly used NLP Experiments on datasets

The study was conducted in 55 Evaluation on data sets , Then compare them with GPT3 and T0pp Compare . And GPT3 The comparison results are shown in the figure ： In addition cb On four data sets other than the data set ,RST-All and RST-Task All have more advantages than GPT3 Small samples of learning better zero sample performance . Besides ,cb The dataset is the smallest of these datasets , Only in the validation set 56 Samples , So it's different prompt The performance on this dataset will fluctuate greatly .

And T0pp The comparison results are shown in the table 4-6 Shown . For example, in 55 Of the measured average performance ,RST-All stay 49 Beat on data sets T0pp, And in 47/55 Example wins with maximum performance . Besides , stay 55 Average performance test of data sets ,RST-Task stay 52 Better than T0pp, And in 50/55 Beyond... Under examples T0pp. This shows the superiority of reconstructive learning .

Best performing model RST-Task What tasks are you good at ？ To answer this question , The study will RST-Task The performance of the model in the zero sample setting is similar to the current SOTA Model comparison , The result is shown in Fig. 13 Shown .RST-Task Good at topic classification 、 Emotion classification and natural language reasoning tasks , But the performance in information extraction task is poor .

3.1 College entrance examination experiment ： Towards the human level AI

The study collected 10 College entrance examination English papers , Include 2018 National examination in I/III、2019 National examination in I/II/III、2020 National examination in I/II/III、2021 The national volume of A/B. These papers follow the same questions , They divided all exam questions into the following seven sub categories , As shown in the table 7 Shown ：

Every college entrance examination English paper has a full score 150 branch . hearing 、 Cloze 、 read 、 Writing accounts for 30、45、40、35. Usually , The writing part is subjective , It needs to be evaluated manually , The rest is objective , Can score automatically . As shown in the table 8 Shown ：

Use table 1 The reconstruction engineering cycle shown in to construct college entrance examination English AI System , namely Qin. The whole process is as shown in the figure 14 Shown ：

The study used the following prompt Convert the original signal tuple to prompt sample , As shown in the table 9 Shown ：

The experimental results are shown in the table 10-11 Shown , We can draw the following conclusion ： In every English test paper ,RST I got the highest total score in the two listening tests , The average score is 130.6 branch ; And T0pp comparison ,RST The performance of is much better than that of T0pp. In all settings ,RST The average ratio of total scores obtained T0pp Higher than 54.5 branch , The maximum gap is 69 branch ( Of the total score 46%); And GPT3 comparison ,RST It can be used when the model size is small 16 Significantly better results can be obtained in the case of times . Of all the settings considered ,RST The average ratio of total scores obtained T0pp high 14.0 branch , Up to 26 branch （ Of the total score 17%）; about T0pp, The listening scores obtained by using gold and voice to text transcripts vary greatly , Average is 4.2 branch . by comparison ,GPT3 and RST Respectively 0.6 and 0.45, indicate T0pp The performance of is sensitive to text quality .

The study carried out a fine-grained analysis , To understand the performance of different models in different problem subcategories . In the figure 15-(a) in , Obviously RST and GPT3 Better than... In each problem subcategory T0pp.

chart 15-(b) For the performance of the model in recent years and the average performance of students on the national test paper . Obviously ,T0pp stay 9/10 The total score on the test paper is lower than the average of the students , and RST and GPT3 The performance of students exceeds the average level of students . In particular, there are five of the ten papers ,RST The total score of exceeds 130（ It is usually considered as the target score for students ）.

2022 College entrance examination - English test （2022.06.08） Just finished , Understand the performance of the model in the recent year's college entrance examination papers . The study used GPT3 and RST experiment . Results show RST The total score is 134, Far above GPT3 Achieved 108 branch .

At the end of the paper, there are three colored eggs , More details , Please check the original paper .

原网站

版权声明
本文为[Zhiyuan community]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/179/202206280102470035.html

当前位置：网站首页>CMU puts forward a new NLP paradigm - reconstructing pre training, and achieving 134 high scores in college entrance examination English

CMU puts forward a new NLP paradigm - reconstructing pre training, and achieving 134 high scores in college entrance examination English

边栏推荐

猜你喜欢

随机推荐