当前位置:网站首页>Google &huggingface| zero sample language model structure with the strongest ability
Google &huggingface| zero sample language model structure with the strongest ability
2022-06-23 14:46:00 【Zhiyuan community】
from GPT3 To Prompt, More and more people find that large models learn from zero samples (zero-shot) Has a very good performance under the setting of . This makes everyone feel AGI More and more people are looking forward to the arrival of .
But there is one thing that makes people very confused :19 year T5 adopt “ Adjustable parameter ” Find out , When designing the pre training model ,Encoder-Decoder Model structure of + MLM Mission , Downstream missions finetune The effect is the best . But in the 2202 The present of , Mainstream big models use only decoder Model structure design , such as OpenAI Of GPT series 、Google Of PaLM [1]、Deepmind Of Chinchilla [2] wait . Why is that ? Are there problems with the design of these large models ?
Today I bring an article Hugging Face and Google The article . This article is related to T5 The ideas in the experiment are similar , Through a lot of comparative design , Get a big conclusion : If it's for the model zero-shot Generalization ability ,decoder structure + Language model tasks are best ; If you multitask finetuning,encoder-decoder structure + MLM Best mission .
Besides finding the best way to train , Through a large number of experiments , Also found the best and most cost-effective training methods . Only one ninth of the training calculation is required !
Thesis title :
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thesis link :
https://arxiv.org/abs/2204.05832
边栏推荐
- 2021-06-07
- What do you mean by waiting for insurance records? Where should I go for filing?
- Win the championship for 2 consecutive years! ZABBIX ranked first in a number of monitoring software in 2022
- Unity realizes the function of playing Ogg format video
- Penetration test - right raising topic
- Assembly language interrupt and external device operation --06
- Is flush a stock? Is it safe to open an account online now?
- 腾讯云服务器发送邮件失败
- Low grain prices hurt farmers, low wages hurt farmers!
- ai智能机器人让我们工作省时省力
猜你喜欢

腾讯云服务器发送邮件失败

【深入理解TcaplusDB技術】TcaplusDB構造數據

useState vs useRef 和 useReducer:相同点、不同点和用例

2021-05-08

LEGO announces price increase, speculators are more excited

Penetration test - right raising topic

百万奖金等你来拿,首届中国元宇宙创新应用大赛联合创业黑马火热招募中!

用OBS做直播推流简易教程

The data value reported by DTU cannot be filled into Tencent cloud database through Tencent cloud rule engine

巴比特 | 元宇宙每日必读:Meta、微软等科技巨头成立元宇宙标准论坛组织,华为、阿里加入,英伟达高管称欢迎来自加密世界的参与者...
随机推荐
[deeply understand tcapulusdb technology] tcapulusdb import data
As a software testing practitioner, do you understand your development direction?
[digital signal processing] linear time invariant system LTI (judge whether a system is a "non time variant" system | case 3)
Do you know which position in the IT industry has the most girls?
vim备份历史命令
The company has only one test, but the leader asked me to operate 1000 mobile numbers at the same time
首个大众可用PyTorch版AlphaFold2复现,哥大开源OpenFold,star量破千
AI talk | data imbalance refinement instance segmentation
[in depth understanding of tcapulusdb technology] tcapulusdb business data backup
百萬獎金等你來拿,首届中國元宇宙創新應用大賽聯合創業黑馬火熱招募中!
2021-06-03
Is flush a stock? Is it safe to open an account online now?
HCIA 网络基础
The principle of redis cache consistency deep analysis
WPF (c) new open source control library: newbeecoder UI waiting animation
MATLAB|时序数据中的稀疏辅助信号去噪和模式识别
Error 1079 when starting a service: the account of this service is different from that of other services running on the same process
The second Tencent light · public welfare innovation challenge was launched, and the three competition topics focused on the social value of sustainable development
等保备案是什么意思?应该去哪里办理备案?
The new version of Alibaba Seata finally solves the idempotence, suspension and empty rollback problems of the TCC mode