当前位置：网站首页>Effect evaluation of generative countermeasure network

Effect evaluation of generative countermeasure network

2022-07-24 22:46:00 【Ashes little fish man Timo】

Scene description

GAN It is one of the most popular image generation models nowadays ; We can see in many papers that different GAN Generate clear and realistic images . However , If only the naked eye is used to subjectively evaluate the image quality , Obviously, the performance of a model cannot be scientifically evaluated , We need to use appropriate methods to measure quantitatively GAN Generation capacity of , Accurately describe the quality and diversity of the generated samples , Measure the difference between the generated distribution and the real distribution . At present, we often pass IS (Inception Score)、FID ( Frechet Inception Distance ) Evaluate the quality and diversity of the generated samples .

IS and FID Principle

Is It is often used to evaluate the quality of the generated image , In its name Inception originate InceptionNet, Because calculation IS You need a stay ImageNet Pre trained on the dataset Inception-v3 Classification of network .IS Actually, I'm making a KL Divergence calculation , The specific formula is

$\operatorname{IS}(G)=\exp \left(\mathbb{E}_{\boldsymbol{x} \sim p_{g}(x)} \mathrm{KL}(p(y \mid \boldsymbol{x}) \| p(y))\right)$

among ,p(y|x) It refers to a given generated image x, Input it into the pre trained Inception-v3 The category probability output after classifying the network ; p(y) Is marginal distribution , For all generated images , This pre trained classification network outputs the expectation of the probability of the category . If the generated image contains meaningful and clearly identifiable targets , Then the classification network should judge the image as a specific category with high confidence , therefore p(y|x) It should have a small entropy . Besides , To generate images with diversity ,p(y) It should have a large entropy . If p(y) The entropy of is larger ,p(y|x) Small entropy , That is, the generated image contains a lot of categories , And the category of each image is clear and has high confidence , here p(y|x) And p(y) Of KL Great divergence . It can be seen that ,IS The real sample is not compared with the generated sample , It only quantifies the quality and diversity of the generated samples .

FID To make up for it IS Deficiency , The comparison between real samples and generated samples is added . It also inputs the generated samples into the classification network , The difference is ,FID Not the output probability of the last layer of the network p(y|x) To operate , Instead, it operates on the response of the penultimate layer of the network, that is, the characteristic graph . say concretely ,FID It is calculated by comparing the mean and variance of the characteristic graph of the real sample and the generated sample :

$\mathrm{FID}=\left\|\mu_{\text {data }}-\mu_{g}\right\|^{2}+\operatorname{Tr}\left(\sum_{\text {data }}+\sum_{g}-2\left(\sum_{\text {data }} \sum_{g}\right)^{\frac{1}{2}}\right)$

among , $\mu _{data}$ and $\sum _{data}$ Represent the mean and covariance matrix of real samples respectively , $\mu _{g}$ and $\sum _{g}$ And respectively represent the mean and covariance matrices of the generated samples ,Tr（.） Trace representing matrix .FID The lower the value , It shows that the closer the statistics of the generated sample and the real sample are . However ,FID The characteristic graph is approximated as Gaussian distribution , The way of calculating mean and variance is too intertwined , Unable to evaluate image details .

summary

IS and FID Is currently the GAN The two most widely used evaluation methods in the field of image .IS And FID Realized with GAN Quantitative evaluation of generative capacity , But they are all descriptions of the overall performance , There is no way from diversity 、 Independent measurement of a single generated sample from the perspective of quality . in addition , They all depend on using ImageNet Pre trained classification network , For other types of data sets ( Such as facial images or medical imaging data ) Not very suitable. .

except IS and FID, There are other assessments GAN Methods of generating capabilities , Such as mode score (Mode Score )、 Maximum mean difference 、 Nearest neighbor two sample test ( C2ST) 、 section W- distance ( Sliced Wassecrtein Distance, SWD).

source ： In depth learning

原网站

版权声明
本文为[Ashes little fish man Timo]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/205/202207242234406315.html