当前位置:网站首页>Tsinghua & Zhiyuan | cogview2: faster and better text image generation model
Tsinghua & Zhiyuan | cogview2: faster and better text image generation model
2022-06-27 01:13:00 【Zhiyuan community】

The title of the paper :CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers(arxiv)
The work of the team of vice president Tang Jie of Zhiyuan , First author Dingming , It is the latest development of the enlightenment model . stay Reddit Get on A lot of attention .GitHub There is already 500 Multi star .
Abstract
be based on Transformer The development of text to image model , The slow generation and complexity of high-resolution images . In this paper , We propose a method based on layering Transformer And local parallel autoregressive generation . We pre trained a with a simple and flexible self supervised task 60 Billion parameter Transformer Model —— Cross modal common language model (CogLM) , And fine tune it to achieve fast super-resolution . Compared with the most advanced DALL·E 2 comparison , New text to image system CogView2 Show very competitive generation , And it naturally supports interactive text guided editing of images .
The last part of the paper is very interesting :
Autoregression or diffusion ? Even though GPT Great success in text generation , But diffusion model is becoming more and more popular in image generation . We compare the diffusion model with the autoregressive model in terms of speed , This is the first 1 The biggest drawback of the autoregressive model discussed in section . Under the same architecture , The diffusion model needs more FLOP, But it has a high degree of parallelism . They can also make a trade-off between quality and time consumption by manually arranging the sampling step . for example ,Glide [19] sampling 250 A diffusion step is evaluated , as well as 27 Steps for interactive sampling , This reduces the delay to 15 second .
The autoregressive model must generate images one by one , But our LoPAR The image can be upsampled with high parallelism , therefore ( Potentially ) We can design the model by introducing more hierarchies , Thus, the time cost can be reduced faster than the diffusion model .DALL-E-2 and CogView2 Comparison . DALL·E 2 [27] Is a recently released for use in 1024 × 1024 The parallel work of generating text to image on resolution . Although its probabilistic model and architecture are similar to CogView2 There's a big difference , But both have the same spirit —— Hierarchical generation .CogView2 Can be based on DALL-E2 A limited demonstration of compositing similar scenes , for example “ Lion teacher ”( chart 1) And “ Panda scientists ”(DALL·E 2), Even though CogView2 Only trained. DALL·E 2 Of the total data used 5% about . And CogView2 comparison ,DALL·E 2 The main difference between the three-level super-resolution and “ zero ” Level image prior generation . Because training a three-level super-resolution is very resource consuming , And it is more engineering oriented , We leave it to future work .
Code : https://github.com/THUDM/CogView2
Students who want to experiment may want to pay attention to , This model has high hardware requirements , recommend NVIDIA A100 machine .
边栏推荐
- JSON解析,ESP32轻松获取时间气温和天气
- Interface test framework practice (I) | requests and interface request construction
- TopoLVM: 基于LVM的Kubernetes本地持久化方案,容量感知,动态创建PV,轻松使用本地磁盘
- 疫情期间居家办公的总结体会 |社区征文
- Modeling specifications: environment settings
- Kept to implement redis autofailover (redisha) 17
- Esp32-solo development tutorial to solve config_ FREERTOS_ UNICORE problem
- Xiaobai looks at MySQL -- installing MySQL in Windows Environment
- Keepalived 实现 Redis AutoFailover (RedisHA)15
- About Random Numbers
猜你喜欢

做了两天的唯美蝴蝶动画

Custom MVC (imported into jar package) + difference from three-tier architecture + reflection + interview questions

Topolvm: kubernetes local persistence scheme based on LVM, capacity aware, dynamically create PV, and easily use local disk

C#程序结构预览最基础入门

ESP32-添加多目录的自定义组件

Solve the problem that stc8g1k08 program cannot run and port configuration

CH423要如何使用,便宜的国产IO扩展芯片

Solve the problem that only one line of text is displayed or not displayed in u8glib

一键加速索尼相机SD卡文件的复制操作,文件操作批处理教程

flutter系列之:flutter中的flow
随机推荐
大白话高并发(一)
解决unable to create a folder to save the sketch: mkdir sketch
Bootstrapblazor + FreeSQL actual combat chart usage (2)
Esp32-solo development tutorial to solve config_ FREERTOS_ UNICORE problem
Xiaobai looks at MySQL -- installing MySQL in Windows Environment
MySQL之账号管理、建库以及四大引擎+案例
Kept to implement redis autofailover (redisha) 16
JSON解析,ESP32轻松获取时间气温和天气
TopoLVM: 基于LVM的Kubernetes本地持久化方案,容量感知,动态创建PV,轻松使用本地磁盘
Custom jsp[if, foreach, data, select] tag
How to measure the thickness of glass substrate by spectral confocal
Kept to implement redis autofailover (redisha) 11
Play OLED, u8g2 animation, increasing numbers, random triangles, etc
CLIP:从自然语言监督中学习可迁移的视觉模型
Memcached Foundation
30《MySQL 教程》MySQL 存储引擎概述
Law of Large Numbers
统计无向图中无法互相到达点对数[经典建邻接表+DFS统计 -> 并查集优化][并查集手册/写的详细]
Count the logarithm of points that cannot reach each other in an undirected graph [classic adjacency table building +dfs Statistics - > query set optimization] [query set manual / write details]
memcached基础1