当前位置:网站首页>Parti,谷歌的自回归文生图模型
Parti,谷歌的自回归文生图模型
2022-06-24 12:34:00 【智源社区】
We introduce the Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge. Recent advances with diffusion models for text-to-image generation, such as Google’s Imagen, have also shown impressive capabilities and state-of-the-art performance on research benchmarks. Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively – opening exciting opportunities for combinations of these two powerful models.
Parti treats text-to-image generation as a sequence-to-sequence modeling problem, analogous to machine translation – this allows it to benefit from advances in large language models, especially capabilities that are unlocked by scaling data and model sizes. In this case, the target outputs are sequences of image tokens instead of text tokens in another language. Parti uses the powerful image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens, and takes advantage of its ability to reconstruct such image token sequences as high quality, visually diverse images.
We observed the following results:
- Consistent quality improvements by scaling Parti’s encoder-decoder up to 20 billion parameters.
- State-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO.
- Effectiveness across a wide variety of categories and difficulty aspects in our analysis on Localized Narratives and PartiPrompts, our new holistic benchmark of 1600+ English prompts that we release as part of this work.
We also explore and highlight limitations of our models, giving key example areas of focus for further improvements.
边栏推荐
- nifi从入门到实战(保姆级教程)——环境篇
- [highlights] summary of award-winning activities of Tencent cloud documents
- 短信服务sms
- 几种常见的DoS攻击
- Experience of IOS interview strategy - App testing and launching
- A "full cloud" journey of a quasi financial system
- 手把手教你用AirtestIDE无线连接手机!
- Reset the password, and the automatic login of the website saved by chrome Google browser is lost. What is the underlying reason?
- 简述聚类分析
- [log service CLS] Tencent cloud log service CLS accesses CDN
猜你喜欢
随机推荐
Hardware enterprise website ranking, 8 commonly used processes
Baidu simian: talk about persistence mechanism and rdb/aof application scenario analysis!
What is the reason why the video intelligent analysis platform easycvr is locally controllable but the superior equipment cannot control the subordinate equipment?
Installing sqlserver extension PDO of PHP under Linux_ sqlsrv
Istio practical skills: implement header based authorization
实现领域驱动设计 - 使用ABP框架 - 更新操作实体
Pycharm中使用Terminal激活conda服务(终极方法,铁定可以)
Making daily menu applet with micro build low code
Use abp Zero builds a third-party login module (I): Principles
SMS SMS
Five minutes to develop your own code generator
Kubernetes practical skill: entering container netns
Reading notes of returning to hometown
How to make secruecrt more productive
Getting started with the lvgl Library - colors and images
Encapsulate the method of converting a picture file object to Base64
Metamask项目方给Solidity程序员的16个安全建议
RTMP streaming platform easydss video on demand interface search bar development label fuzzy query process introduction
关于被黑数据库那些事
IOMMU (VII) -vfio and mdev








