当前位置:网站首页>谷歌&HuggingFace| 零样本能力最强的语言模型结构
谷歌&HuggingFace| 零样本能力最强的语言模型结构
2022-06-23 13:51:00 【智源社区】
从 GPT3 到 Prompt,越来越多人发现大模型在零样本学习(zero-shot)的设定下有非常好的表现。这都让大家对 AGI 的到来越来越期待。
但有一件事让人非常疑惑:19 年 T5 通过“调参”发现,设计预训练模型时,Encoder-Decoder 的模型结构 + MLM 任务,在下游任务 finetune 效果是最好的。可是在 2202 年的当下,主流的大模型用的都是仅 decoder 的模型结构设计,比如 OpenAI 的 GPT 系列、Google 的 PaLM [1]、Deepmind 的 Chinchilla [2] 等等。这是为什么?难道这些大模型设计都有问题?
今天带来一篇 Hugging Face 和 Google 的文章。这篇文章与 T5 在实验上的思路相似,通过大量对比设计,得到一个重磅结论:要是为了模型的 zero-shot 泛化能力,decoder 结构 + 语言模型任务最好;要是再 multitask finetuning,encoder-decoder 结构 + MLM 任务最好。
除了找到最好的训练方式,作者通过大量的实验,还找到了最好的同时还能最节省成本的训练方式。训练计算量只需要九分之一!
论文题目:
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
论文链接:
https://arxiv.org/abs/2204.05832
边栏推荐
- [deeply understand tcapulusdb technology] tmonitor background one click installation
- Add Icon before input of wechat applet
- Hexiaopeng: if you can go back to starting a business, you won't name the product in your own name
- ai智能机器人让我们工作省时省力
- 系统设计与分析-技术报告-定时清理验证码的一种解决方案
- KS003基于JSP和Servlet实现的商城系统
- IEEE transaction journal revision process record
- ACM Player Illustration leetcode remove element
- KS007基于JSP实现人个人博客系统
- In this year's English college entrance examination, CMU delivered 134 high scores with reconstruction pre training, significantly surpassing gpt3
猜你喜欢

As a software testing practitioner, do you understand your development direction?

Ks008 SSM based press release system
![[in depth understanding of tcapulusdb technology] how to realize single machine installation of tmonitor](/img/6d/8b1ac734cd95fb29e576aa3eee1b33.png)
[in depth understanding of tcapulusdb technology] how to realize single machine installation of tmonitor
![[deeply understand tcapulusdb technology] tmonitor module architecture](/img/92/c579ce1e1ce881dd28a2794ffc22f2.png)
[deeply understand tcapulusdb technology] tmonitor module architecture

渗透测试-提权专题

Building Intel devcloud

In this year's English college entrance examination, CMU delivered 134 high scores with reconstruction pre training, significantly surpassing gpt3

Intelligent digital signage solution

【深入理解TcaplusDB技术】TcaplusDB业务数据备份

Vulnhub target os-hacknos-1
随机推荐
[Course preview] AI meter industry solution based on propeller and openvino | industrial meter reading and character detection
[compréhension approfondie de la technologie tcaplusdb] données de construction tcaplusdb
The second Tencent light · public welfare innovation challenge was launched, and the three competition topics focused on the social value of sustainable development
微信小程序之获取php后台数据库转化的json
ICML 2022 𞓜 context integrated transformer based auction design neural network
[Level 2 warranty] which brand of Fortress machine is good for Level 2 warranty?
Instructions for laravel8 Beanstalk
Working for 7 years to develop my brother's career transition test: only by running hard can you get what you want~
Sqlserver2008r2 failed to install DTS component
Hot Recruitment! The second Tencent light · public welfare innovation challenge is waiting for you to participate
连续2年夺冠!Zabbix在2022年多款监控软件排名第一
Xmake v2.6.8 发布,编译缓存改进
How to correctly calculate the number of rows imported into EXCEL (poi/npoi)
When did the redo log under InnoDB in mysql start to perform check point disk dropping?
Proofs of Elsevier Elsevier Journal (Neptune Neptune) (problems encountered: latex remove the chapter number)
同花顺是股票用的么?现在网上开户安全么?
微信小程序之input调整
Go write file permission WriteFile (filename, data, 0644)?
【深入理解TcaplusDB技術】TcaplusDB構造數據
Xmake v2.6.8 release, compilation cache improvement