当前位置:网站首页>Google &huggingface| zero sample language model structure with the strongest ability
Google &huggingface| zero sample language model structure with the strongest ability
2022-06-23 14:46:00 【Zhiyuan community】
from GPT3 To Prompt, More and more people find that large models learn from zero samples (zero-shot) Has a very good performance under the setting of . This makes everyone feel AGI More and more people are looking forward to the arrival of .
But there is one thing that makes people very confused :19 year T5 adopt “ Adjustable parameter ” Find out , When designing the pre training model ,Encoder-Decoder Model structure of + MLM Mission , Downstream missions finetune The effect is the best . But in the 2202 The present of , Mainstream big models use only decoder Model structure design , such as OpenAI Of GPT series 、Google Of PaLM [1]、Deepmind Of Chinchilla [2] wait . Why is that ? Are there problems with the design of these large models ?
Today I bring an article Hugging Face and Google The article . This article is related to T5 The ideas in the experiment are similar , Through a lot of comparative design , Get a big conclusion : If it's for the model zero-shot Generalization ability ,decoder structure + Language model tasks are best ; If you multitask finetuning,encoder-decoder structure + MLM Best mission .
Besides finding the best way to train , Through a large number of experiments , Also found the best and most cost-effective training methods . Only one ninth of the training calculation is required !
Thesis title :
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thesis link :
https://arxiv.org/abs/2204.05832
边栏推荐
- [compréhension approfondie de la technologie tcaplusdb] données de construction tcaplusdb
- Win the championship for 2 consecutive years! ZABBIX ranked first in a number of monitoring software in 2022
- Illustration of ONEFLOW's learning rate adjustment strategy
- Golang-- judge whether the strings are equal
- NFNet:NF-ResNet的延伸,不用BN的4096超大batch size训练 | 21年论文
- How to merge tables when exporting excel tables with xlsx
- Gold three silver four, busy job hopping? Don't be careless. Figure out these 12 details so that you won't be fooled~
- HCIA 网络基础
- 同花顺是股票用的么?现在网上开户安全么?
- 这届文娱人,将副业做成了主业
猜你喜欢

信贷产品额度定价场景下的回归模型效果评估

MySQL 创建和管理表

Binding events of wechat applet in wx:for

The new version of Alibaba Seata finally solves the idempotence, suspension and empty rollback problems of the TCC mode
![[deeply understand tcapulusdb technology] tcapulusdb import data](/img/c5/fe0c9333b46c25be15ed4ba42f7bf8.png)
[deeply understand tcapulusdb technology] tcapulusdb import data

Penetration test - right raising topic

Win the championship for 2 consecutive years! ZABBIX ranked first in a number of monitoring software in 2022

Assembly language interrupt and external device operation --06
![[digital signal processing] linear time invariant system LTI (judge whether a system is a](/img/98/6819646ea602781101ac9994213129.jpg)
[digital signal processing] linear time invariant system LTI (judge whether a system is a "non time variant" system | case 3)

2021-06-03
随机推荐
2021-04-15
Test article
[digital signal processing] linear time invariant system LTI (judge whether a system is a "non time varying" system | case 1 | transform before shift | shift before transform)
What is the working status of software testing with a monthly salary of 7500
WPF (c) new open source control library: newbeecoder UI waiting animation
Quick view of wechat applet development process
The first public available pytorch version alphafold2 is reproduced, and Columbia University is open source openfold, with more than 1000 stars
Uniswap acquires genie, an NFT transaction aggregator. Will the NFT transaction market change?
2021-05-08
Test article
同花顺是股票用的么?现在网上开户安全么?
系统设计与分析课程项目个人小结
Un million de bonus vous attend, le premier concours d'innovation et d'application de la Chine Yuan cosmique Joint Venture Black Horse Hot Recruitment!
In this year's English college entrance examination, CMU delivered 134 high scores with reconstruction pre training, significantly surpassing gpt3
golang--判断字符串是否相等
用OBS做直播推流简易教程
MATLAB|时序数据中的稀疏辅助信号去噪和模式识别
Auto - vérification recommandée! Les bogues MySQL ne font pas reculer les transactions, peut - être êtes - vous à risque!
How to merge tables when exporting excel tables with xlsx
Selenium Edge的IE模式