当前位置:网站首页>AI clothing generation helps you complete the last step of clothing design
AI clothing generation helps you complete the last step of clothing design
2022-06-25 02:30:00 【Paddlepaddle】
This article has been published on the official account of the flying oar , Please check the link :
AI Clothing generation , Help you finish the last step of fashion design

How to use AI Empowering the fashion design industry , This is what Hong Li, the technical expert of the propeller developer, has been thinking about . After the designer conceives and draws a dress , If you can generate the overall effect of clothes with one click , It can help them according to the version of the finished product 、 Style and other factors to make a better design . After the basic idea of the project is determined , Hong Lizai AI Studio The platform began to practice using the propeller frame . At present, this project can realize garment generation , We look forward to discussing more optimizations with more developers ( For example, the diversity of design presentation ), The following is Hong Li's sharing .
Project background
In order to establish the basic design objectives of the clothing generation project , I need to look for relevant technologies . One of the differences between garment generation task and other generation tasks , It is required to output a “ clean ” Clothes , No fancy backgrounds —— That is, it requires the generator to focus on clothes , Instead of generating a complete picture . So in loss Part of the design , I thought about clothes mask Mask part —— For the model , The input is the semantic segmentation information , But the output is the clothing picture . I came across it on the video website SPADE Display effect of , such “ Ma Liang's magic pen ” The general presentation effect makes me feel very friendly , And realize SPADE The architecture meets my general needs for the model . But when I use SPADE In training , It is difficult to train the model , By chance , I found a new paper Semantically Multi-modal Image Synthesis.
This paper introduces in detail
https://aistudio.baidu.com/aistudio/projectdetail/3454453
This paper is based on SPADE framework , Use deep fashion Data sets , Show good results . among Encoder The output of preserves the spatial structure information , I think this is why the model is easier to train . Based on this , I was able to easily modify my first version of the project , The prototype of the model framework of garment generation project is generated .
Project practice
Click on GET Detailed address of the project
https://aistudio.baidu.com/aistudio/projectdetail/3405079
Problems and data tuning
In the early stage of model training, I mainly encountered two problems :
One is the selection of training sets .
Second, model training is very slow ,1 individual batch It takes tens of seconds .
To solve the above problems , I processed the data according to the following steps :
1. selection FGVC6 Data sets . This data set provides accurate marking areas for each part of clothes , Altogether 46 class , As shown in Figure 1 .

chart 1 FGVC Data set presentation [1]
2. When testing , This is no longer required for model input GT, Only the semantic segmentation information is needed , The details of this part are to be explained in detail below .
3. Input to model semantic segmentation Tensor The format is [batch_size,class_num,H,W], The details are as follows [4,46,256,256],46 Yes, there are 46 individual label, During training batch_size by 4.
4. in addition ,loss The mask of the clothes should be taken into account in the calculation .
5. Because the format of the data I input is 256*256, Therefore, we need to segment the image and semantic information resize, Give Way H and W All for 256. The image size of the original dataset is too large ,H and W Worth even thousands , Lead to resize The operation takes a lot of time . If a doubting friend asks me , Why not crop Cutting cloth ? Here are a few reasons :
First , I designed a clothing generation project , So the model considers the whole clothing , Give as little local information as possible , To prevent a glimpse of the leopard , Let the model have its own “ pattern ”.
secondly , In a real picture , The position of the clothes themselves is not large and the position is not fixed , There is a high probability that 2500*2500 Cut out a 256*256 The area is completely black , Unable to provide information .
Last , use crop Processing data in a tailoring way , It is easy to cause that the visibility probability of some label models is very small , For example, the proportion of shoes is very small , It is easy to cause the inaccuracy of the generated effect .
6. Aiming at the problem that the model training is too slow , I tried a two-step tuning scheme . First , I started with online resize ,1 individual batch It takes tens of seconds . After summing up, I found that , Because the format of the image data I input is 256*256, So it's unlikely to be forward propagation tensor The problem of computation , Therefore, the problem can be located in the data preprocessing part . therefore , Under the guidance of the propeller developer motivator , I tried offline for the first time resize, Save as npy, It ensures the smooth start of model training . Besides , In the offline resize When saving semantic segmentation information , What I originally set up npy yes [256,256,class_num], Too sparse , Takes up a lot of memory , Save only 1000 Group around . actually , A pixel actually has only one label , therefore , I adjust storage , Save it as [256,256,1], Finally, it can be saved as 10000 Group around , Greatly improve storage efficiency .
Thinking about model training and loss
1. I took GAN As the main form of the model : The generator body is Semantically Multi-modal Image Synthesis The model architecture of ; The discriminator uses Multihead Discriminator, This can support the feature alignment of the discriminator .
2. The discriminator has three tasks , It needs to be judged Ground Truth by True, The picture generated by the discrimination generator is False, At the same time, it is required to distinguish semantic segmentation as False( Improvements ), This is to help the generator generate more complex and realistic textures .
3. In order to focus the center of the model on the area with clothes , Generator's featloss Only consider the binary value of clothes (0,1)mask Part of .
4. I will spade.py in nn.conv2d(46,128) Become ordinary convolution , Packet convolution is not used , The reason is that 46 You can't get rid of it group_num = 4.

chart 2 Semantically Multi-modal Image Synthesis Model architecture
loss visualization
Finally, loss Visualizing , Pictured 3 Shown , among :
d_real_loss: The discriminator determines that the real picture is True;
d_fake_loss: The picture generated by the discriminator discriminator generator is False;
d_seg_ganloss: The discriminator distinguishes the semantic segmentation as False;
d_all_loss: d_real_loss + d_fake_loss + d_seg_ganloss;
g_ganloss: It is required that the picture generated by the generator can be judged as True;
g_featloss: In the discriminator, the image generated by the generator is aligned with the real image features ;
g_vggloss: Generator generated pictures and GT adopt VGG Calculate the perceived loss ;
g_styleloss: Generators and GT adopt Gram Matrix computing style loss ;
kldloss: Calculate a positive distribution with the standard kl The divergence ;
g_loss:g_ganloss+g_featloss+g_vggloss+g_styleloss+kldloss.

chart 3 loss visualization
Effect display
It’s show time! Generate effects for the model from left to right 、Ground Truth( It can be understood as the model reference answer )、 Semantic segmentation visualization of model input .
chart 4 Effect display
Review and think
There are still many things worth improving in this project , For example, whether it is possible to optimize the model framework , Provide more delicate feature control for clothes , Or better improve the diversity of the generated models . I will continue to study in the field of image generation , Looking forward to better public projects in the future , Welcome to communicate with me .
reference
[1] Semantic Image Synthesis with Spatially-Adaptive Normalization
[2] Semantically Multi-modal Image Synthesis
Focus on 【 Flying propeller PaddlePaddle】 official account
Get more technical content ~
边栏推荐
猜你喜欢

【Proteus仿真】Arduino UNO+数码管显示4x4键盘矩阵按键
![[STL source code analysis] configurator (to be supplemented)](/img/87/0ed1895e9cdb5327411c0c9cb0197f.png)
[STL source code analysis] configurator (to be supplemented)

Random list random generation of non repeating numbers

yarn : 无法加载文件 C:\Users\xxx\AppData\Roaming\npm\yarn.ps1,因为在此系统上禁止运行脚本

入坑机器学习:一,绪论

EasyCVR国标协议接入的通道,在线通道部分播放异常是什么原因?

Rod and Schwartz cooperated with ZhongGuanCun pan Lianyuan Institute to carry out 6G technology research and early verification

I've been doing software testing for two years. I'd like to give some advice to girls who are still hesitating

Hashcat 的使用

折叠屏将成国产手机分食苹果市场的重要武器
随机推荐
转行软件测试2年了,给还在犹豫的女生一点建议
How to uninstall CUDA
Of the seven levels of software testers, it is said that only 1% can achieve level 7
Experience of epidemic prevention and control, home office and online teaching | community essay solicitation
测试/开发程序员,30而立,你是否觉得迷茫?又当何去何从......
当一个接口出现异常时候,你是如何分析异常的?
产业互联网的概念里有「互联网」字眼,但却是一个和互联网并不关联的存在
F - spices (linear basis)
What are the reasons for the abnormal playback of the online channel of the channel accessed by easycvr national standard protocol?
Practice and Thinking on process memory
Migrate Oracle database from windows system to Linux Oracle RAC cluster environment (4) -- modify the scanip of Oracle11g RAC cluster
MySQL command backup
计网 | 【四 网络层】知识点及例题
The role of software security testing, how to find a software security testing company to issue a report?
centos7.3修改mysql默认密码_详解Centos7 修改mysql指定用户的密码
F - Spices(线性基)
June 24, 2022: golang multiple choice question, what does the following golang code output? A:1; B:3; C:4; D: Compilation failed. package main import ( “f
调用系统函数安全方案
MCN机构遍地开花:博主和作者要谨慎签约、行业水很深
It is said that Yijia will soon update the product line of TWS earplugs, smart watches and bracelets