当前位置:网站首页>[cann training camp] learning notes - Comparison between diffusion and Gan, dalle2 and Party
[cann training camp] learning notes - Comparison between diffusion and Gan, dalle2 and Party
2022-07-23 09:08:00 【Hua Weiyun】
I heard about GAN Live class , Read the related articles , I want to use this note to make a summary . At the same time, this note is also a personal reflection on the third question of the advanced class of the training camp , The question is how to treat GAN and Diffision The development potential of , I think from now on SOTA Model starting is the most intuitive way to feel their ability , So there is this article . except DALLE2 and Parti, I also hope to sort out the front work they involve . Because I haven't thoroughly understood the field of image generation before , Time is short, and the content may also be flawed .
DALLE2

As shown in the figure above ,Dalle2 The training of is divided into two stages . Use the upper half of the dotted line CLIP Contrast learning , To get a text encoder And a image encoder, They can encode words and pictures into vectors respectively and make pictures embedding And words embedding As similar as possible . The lower part is used for image generation , from prior and Decoder form .Decoder The role of will be image encoder The generated code generates the original picture in reverse ,Prior The title text or text embedding Mapping to image embedding In the space of .Decoder It's a diffusion model , and GLIDE be similar , But at the same time clip image embedding The mapping is added to the original input . The article gives two Prior Structure , Autoregressive and diffusion models . Under the manual judgment , The article uses two prior Respectively and GLIDE We found the authenticity of the diffusion model , The effect of Title Consistency and diversity is slightly better than that of autoregressive model

Quantized FID The index also shows the advantages of the diffusion model

Parti be based on Google Newly proposed Pathway Architecture to achieve efficient network training , The largest version has 200 Million parameters

As shown in the figure above , The text of the model is Transformer Encoder code , In the middle of the Transformer Decoder take Text-to-Image Generate as a Seq2Seq Mission . And the picture is made by ViT (Vision Transformer) code ( Here's the picture )

GAN and Diffusion Compare
GAN Because it is necessary to train generator and discriminator at the same time , It is difficult to balance , This makes the training unstable . by comparison ,Diffusion Just train a model , Optimization is easier . however Diffusion Of p The process needs to be completed step by step, which also affects the efficiency of its reasoning . stay Parti Used VQGAN And achieved better results than Diffusion Better results , But also pay attention Parti It has much more parameters than previous models , The pre trained text recognition model will also have a significant impact on the final result , It is difficult to explain whether the improvement of the overall performance of the model comes from GAN, stay Parti At the end of the article, the author also said that we can further consider using Diffusion and autoregression The combination of . In the field of image generation , Personal feeling diffusion Still dominant , however GAN Its application fields are more flexible and extensive , These are Diffusion Irreplaceable .
边栏推荐
- 博途PLC信号处理系列之限幅消抖滤波
- Flutter 3.0
- No requirement document, reject development?
- [cloud native] in the era of storm and cloud, the sharp edge of DBAs is out of the sheath
- Geely Xingrui: from product technology empowerment to cultural confidence
- 数论 —— 整除分块,常见经典例题。
- The concept and method of white box test
- [concurrent programming] Chapter 2: go deep into the reentrantlock lock lock from the core source code
- flutter 线性布局,填充
- OSI七层模型有哪七层?每一层分别有啥作用,这篇文章讲的明明白白!
猜你喜欢

Mathematical modeling -- graph and network models and methods (II)

差分数组操作的一些性质

SIP账号的作用-告诉你什么是SIP线路

Airserver third party projection software v7.3.0 Chinese Version (airplay terminal utility)

K3S - 轻量级Kubernetes集群

Arduino框架下合宙ESP32C3 +1.8“TFT液晶屏通过TFT_eSPI库驱动显示

NodeJS 基于 Dapr 构建云原生微服务应用,从 0 到 1 快速上手指南

There was an accident caused by MySQL misoperation, and "high availability" couldn't stand it

BGP联邦实验

Construction of mGRE network
随机推荐
【微信小程序】开发入门篇(二)
Metauniverse is not an existence that regards traffic as the ultimate pursuit like the Internet
砥砺前行新征程,城链科技狂欢庆典在厦门隆重举行
SQL Server 数据库设计--SELECT语句
买reits基金一定赚钱吗 开户安全吗
There was an accident caused by MySQL misoperation, and "high availability" couldn't stand it
Must I make money by buying REITs funds? Is it safe to open an account
Ali II: why do MySQL indexes use b+ trees instead of jump tables?
198. 打家劫舍
Implementation of OA office system based on JSP
数论 —— 整除分块,常见经典例题。
UGUI源码解析——IClippable
Regular expression conversion to corresponding text gadget
College students downloaded 2578 documents abnormally, and the IP of the University of Social Sciences of China was banned by a database
在线抠图和换背景及擦除工具
【Try to Hack】AWVS安装和简单使用
canal实现Mysql数据同步
差分数组操作的一些性质
讲一讲HART协议
数据可视化平台的下一站 | 来自国产开源数据可视化 datart「超级铁粉」的馈赠