当前位置:网站首页>AI clothing generation helps you complete the last step of clothing design

AI clothing generation helps you complete the last step of clothing design

2022-06-25 02:30:00 Paddlepaddle

This article has been published on the official account of the flying oar , Please check the link :
AI Clothing generation , Help you finish the last step of fashion design

 picture

How to use AI Empowering the fashion design industry , This is what Hong Li, the technical expert of the propeller developer, has been thinking about . After the designer conceives and draws a dress , If you can generate the overall effect of clothes with one click , It can help them according to the version of the finished product 、 Style and other factors to make a better design . After the basic idea of the project is determined , Hong Lizai AI Studio The platform began to practice using the propeller frame . At present, this project can realize garment generation , We look forward to discussing more optimizations with more developers ( For example, the diversity of design presentation ), The following is Hong Li's sharing .

Project background

In order to establish the basic design objectives of the clothing generation project , I need to look for relevant technologies . One of the differences between garment generation task and other generation tasks , It is required to output a “ clean ” Clothes , No fancy backgrounds —— That is, it requires the generator to focus on clothes , Instead of generating a complete picture . So in loss Part of the design , I thought about clothes mask Mask part —— For the model , The input is the semantic segmentation information , But the output is the clothing picture . I came across it on the video website SPADE Display effect of , such “ Ma Liang's magic pen ” The general presentation effect makes me feel very friendly , And realize SPADE The architecture meets my general needs for the model . But when I use SPADE In training , It is difficult to train the model , By chance , I found a new paper Semantically Multi-modal Image Synthesis.

This paper introduces in detail

https://aistudio.baidu.com/aistudio/projectdetail/3454453

This paper is based on SPADE framework , Use deep fashion Data sets , Show good results . among Encoder The output of preserves the spatial structure information , I think this is why the model is easier to train . Based on this , I was able to easily modify my first version of the project , The prototype of the model framework of garment generation project is generated .

Project practice

Click on GET Detailed address of the project

https://aistudio.baidu.com/aistudio/projectdetail/3405079

Problems and data tuning

In the early stage of model training, I mainly encountered two problems :

  • One is the selection of training sets .

  • Second, model training is very slow ,1 individual batch It takes tens of seconds .

To solve the above problems , I processed the data according to the following steps :

1. selection FGVC6 Data sets . This data set provides accurate marking areas for each part of clothes , Altogether 46 class , As shown in Figure 1 .

 picture

chart 1 FGVC Data set presentation [1]

2. When testing , This is no longer required for model input GT, Only the semantic segmentation information is needed , The details of this part are to be explained in detail below .

3. Input to model semantic segmentation Tensor The format is [batch_size,class_num,H,W], The details are as follows [4,46,256,256],46 Yes, there are 46 individual label, During training batch_size by 4.

4. in addition ,loss The mask of the clothes should be taken into account in the calculation .

5. Because the format of the data I input is 256*256, Therefore, we need to segment the image and semantic information resize, Give Way H and W All for 256. The image size of the original dataset is too large ,H and W Worth even thousands , Lead to resize The operation takes a lot of time . If a doubting friend asks me , Why not crop Cutting cloth ? Here are a few reasons :

  • First , I designed a clothing generation project , So the model considers the whole clothing , Give as little local information as possible , To prevent a glimpse of the leopard , Let the model have its own “ pattern ”.

  • secondly , In a real picture , The position of the clothes themselves is not large and the position is not fixed , There is a high probability that 2500*2500 Cut out a 256*256 The area is completely black , Unable to provide information .

  • Last , use crop Processing data in a tailoring way , It is easy to cause that the visibility probability of some label models is very small , For example, the proportion of shoes is very small , It is easy to cause the inaccuracy of the generated effect .

6. Aiming at the problem that the model training is too slow , I tried a two-step tuning scheme . First , I started with online resize ,1 individual batch It takes tens of seconds . After summing up, I found that , Because the format of the image data I input is 256*256, So it's unlikely to be forward propagation tensor The problem of computation , Therefore, the problem can be located in the data preprocessing part . therefore , Under the guidance of the propeller developer motivator , I tried offline for the first time resize, Save as npy, It ensures the smooth start of model training . Besides , In the offline resize When saving semantic segmentation information , What I originally set up npy yes [256,256,class_num], Too sparse , Takes up a lot of memory , Save only 1000 Group around . actually , A pixel actually has only one label , therefore , I adjust storage , Save it as [256,256,1], Finally, it can be saved as 10000 Group around , Greatly improve storage efficiency .

Thinking about model training and loss

1. I took GAN As the main form of the model : The generator body is Semantically Multi-modal Image Synthesis The model architecture of ; The discriminator uses Multihead Discriminator, This can support the feature alignment of the discriminator .

2. The discriminator has three tasks , It needs to be judged Ground Truth by True, The picture generated by the discrimination generator is False, At the same time, it is required to distinguish semantic segmentation as False( Improvements ), This is to help the generator generate more complex and realistic textures .

3. In order to focus the center of the model on the area with clothes , Generator's featloss Only consider the binary value of clothes (0,1)mask Part of .

4. I will spade.py in nn.conv2d(46,128) Become ordinary convolution , Packet convolution is not used , The reason is that 46 You can't get rid of it group_num = 4.

 picture

chart 2 Semantically Multi-modal Image Synthesis Model architecture


loss visualization

Finally, loss Visualizing , Pictured 3 Shown , among :

  • d_real_loss: The discriminator determines that the real picture is True;

  • d_fake_loss: The picture generated by the discriminator discriminator generator is False;

  • d_seg_ganloss: The discriminator distinguishes the semantic segmentation as False;

  • d_all_loss: d_real_loss + d_fake_loss + d_seg_ganloss;

  • g_ganloss: It is required that the picture generated by the generator can be judged as True;

  • g_featloss: In the discriminator, the image generated by the generator is aligned with the real image features ;

  • g_vggloss: Generator generated pictures and GT adopt VGG Calculate the perceived loss ;

  • g_styleloss: Generators and GT adopt Gram Matrix computing style loss ;

  • kldloss: Calculate a positive distribution with the standard kl The divergence ;

  • g_loss:g_ganloss+g_featloss+g_vggloss+g_styleloss+kldloss.

 picture

chart 3 loss visualization

Effect display

It’s show time! Generate effects for the model from left to right 、Ground Truth( It can be understood as the model reference answer )、 Semantic segmentation visualization of model input .

 picture chart 4 Effect display

Review and think

There are still many things worth improving in this project , For example, whether it is possible to optimize the model framework , Provide more delicate feature control for clothes , Or better improve the diversity of the generated models . I will continue to study in the field of image generation , Looking forward to better public projects in the future , Welcome to communicate with me .

reference

[1] Semantic Image Synthesis with Spatially-Adaptive Normalization

[2] Semantically Multi-modal Image Synthesis

Focus on 【 Flying propeller PaddlePaddle】 official account

Get more technical content ~

原网站

版权声明
本文为[Paddlepaddle]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/176/202206242245477807.html