当前位置:网站首页>Google &huggingface| zero sample language model structure with the strongest ability

Google &huggingface| zero sample language model structure with the strongest ability

2022-06-23 14:46:00 Zhiyuan community

from GPT3 To Prompt, More and more people find that large models learn from zero samples (zero-shot) Has a very good performance under the setting of . This makes everyone feel AGI More and more people are looking forward to the arrival of .

But there is one thing that makes people very confused :19 year T5 adopt “ Adjustable parameter ” Find out , When designing the pre training model ,Encoder-Decoder Model structure of + MLM Mission , Downstream missions finetune The effect is the best . But in the 2202 The present of , Mainstream big models use only decoder Model structure design , such as OpenAI Of GPT series 、Google Of PaLM [1]、Deepmind Of Chinchilla [2] wait . Why is that ? Are there problems with the design of these large models ?

Today I bring an article Hugging Face and Google The article . This article is related to T5 The ideas in the experiment are similar , Through a lot of comparative design , Get a big conclusion : If it's for the model zero-shot Generalization ability ,decoder structure + Language model tasks are best ; If you multitask finetuning,encoder-decoder structure + MLM Best mission .

Besides finding the best way to train , Through a large number of experiments , Also found the best and most cost-effective training methods . Only one ninth of the training calculation is required !

Thesis title :
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Thesis link :
https://arxiv.org/abs/2204.05832

 

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231350462824.html

随机推荐