当前位置:网站首页>Party, Google's autoregressive Wensheng graph model
Party, Google's autoregressive Wensheng graph model
2022-06-24 13:16:00 【Zhiyuan community】
We introduce the Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge. Recent advances with diffusion models for text-to-image generation, such as Google’s Imagen, have also shown impressive capabilities and state-of-the-art performance on research benchmarks. Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively – opening exciting opportunities for combinations of these two powerful models.
Parti treats text-to-image generation as a sequence-to-sequence modeling problem, analogous to machine translation – this allows it to benefit from advances in large language models, especially capabilities that are unlocked by scaling data and model sizes. In this case, the target outputs are sequences of image tokens instead of text tokens in another language. Parti uses the powerful image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens, and takes advantage of its ability to reconstruct such image token sequences as high quality, visually diverse images.
We observed the following results:
- Consistent quality improvements by scaling Parti’s encoder-decoder up to 20 billion parameters.
- State-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO.
- Effectiveness across a wide variety of categories and difficulty aspects in our analysis on Localized Narratives and PartiPrompts, our new holistic benchmark of 1600+ English prompts that we release as part of this work.
We also explore and highlight limitations of our models, giving key example areas of focus for further improvements.
边栏推荐
- Optimization of MP4 file missing seconds caused by TS files when downloading videos from easydss video platform
- 我真傻,招了一堆只会“谷歌”的程序员!
- Summary of the process of restoring damaged data in MySQL database
- Definition and use of constants in C language
- CVPR 2022 - Interpretation of selected papers of meituan technical team
- 简述聚类分析
- Another prize! Tencent Youtu won the leading scientific and technological achievement award of the 2021 digital Expo
- Sqlserver 2000 has long-lasting vitality
- From theory to practice, decipher Alibaba's internal MySQL optimization scheme in simple terms
- “我这个白痴,招到了一堆只会“谷歌”的程序员!”
猜你喜欢
LVGL库入门教程 - 颜色和图像
几种常见的DoS攻击
使用 Abp.Zero 搭建第三方登录模块(一):原理篇
1. Snake game design
Yolov6: the fast and accurate target detection framework is open source
mLife Forum | 微生物组和数据挖掘
"Interesting" is the competitiveness of the new era
简述聚类分析
Sinomeni vine was selected as the "typical solution for digital technology integration and innovative application in 2021" of the network security center of the Ministry of industry and information te
Several common DoS attacks
随机推荐
The text to voice function is available online. You can experience the services of professional broadcasters. We sincerely invite you to try it out
How can ffmpeg streaming to the server save video as a file through easydss video platform?
Sqlserver 2000 has long-lasting vitality
解析nc格式文件,GRB格式文件的依赖包edu.ucar.netcdfAll的api 学习
Tencent released credit risk control results safely: it has helped banks lend more than 100 billion yuan
脚本之美│VBS 入门交互实战
钉钉、飞书、企业微信:迥异的商业门道
LVGL库入门教程 - 颜色和图像
短信服務sms
线程同步的基石AbstractQueuedSynchronizer详解
MySQL master-slave replication
一文讲透植物内生菌研究怎么做 | 微生物专题
105. 简易聊天室8:使用 Socket 传递图片
Interesting erasure code
[live broadcast of celebrities] elastic observability workshop
CVPR 2022 | 美团技术团队精选论文解读
Post processing - deep camera deformation effects
生成 4维 的 气压温度的 nc文件,之后进行代码读取(提供代码)
Nifi from introduction to practice (nanny level tutorial) - environment
Common special characters in JS and TS