当前位置：网站首页>Small sample learning data set

Small sample learning data set

2022-06-25 05:03:00 【MondayCat111】

Article reprinted from ：https://blog.csdn.net/qq_36104364/article/details/107508592

This paper sorts out the small sample data sets commonly used in recent years , Provides an introduction to datasets , References and download addresses . All the resources I have have have been uploaded to Baidu cloud disk , Other datasets also provide official download addresses （ Some may need to climb over the wall ）. Finally, a simple summary of each data set is made .

1.Omniglot

Omniglot Data sets are generated from 50 In different languages 1,623 Composed of handwritten characters , Every character has 20 Different handwriting , This constitutes a very large number of sample categories （1623 Kind of ）, But the number of samples in each category is very small （20 individual ） Small sample handwritten character data set . In use, we usually choose 1200 Characters as training set , remainder 423 Characters as a verification set , And by rotating 90°,180° and 270° Data set expansion , Each picture will be cut to uniform size 28*28.
reference ：Lake B, Salakhutdinov R, Gross J, et al. One shot learning of simple visual concepts[C]//Proceedings of the annual meeting of the cognitive science society. 2011, 33(33).
Download address ：https://pan.baidu.com/s/19Y5aGfa-lNEZTDUeL1jP4g
Extraction code ：4y3z

2. miniImageNet

miniImageNet Data sets are from ImageNet In the data set 60,000 Of images , common 100 Categories , Each category has 600 Zhang image , The size of each image is 84*84. One of them is usually selected in use 80 Images of categories are used as training sets , remainder 20 Images of categories are used as validation sets . Some articles divide it into basic sets （Base Class,64 Kind of ）, Verification set （Validation Class,16 Kind of ） And new category sets （Novel Class,20 Kind of ）.
reference ：Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning[C]//Advances in neural information processing systems. 2016: 3630-3638.
Download address ：https://pan.baidu.com/s/1nqBSA1w5mQuhlrQeCY4HgA
Extraction code ：ajrz

3. tieredImageNet

tieredImageNet Data sets are also from ImageNet Selected in the dataset , contain 34 Two categories: （Categories）, Each major class contains 10-30 A small class （Classes）, Each category has a number of different image samples , total 608 Categories ,779,165 Zhang image （ On average, each category contains 1281 A picture ）.34 These categories can be divided into training sets （20 Categories: ）, Verification set （6 Categories: ） And test set （8 Categories: ）, The data set division is shown in the following figure .
Insert picture description here

reference ：Ren M, Triantafillou E, Ravi S, et al. Meta-learning for semi-supervised few-shot classification[J]. arXiv preprint arXiv:1803.00676, 2018.
Download address ：
https://drive.google.com/uc?export=download&confirm=_SLS&id=1g1aIDy2Ar_MViF2gDXFYDBTR-HYecV07

4. CUB-200

CUB-200 The full name of the dataset is Caltech-UCSD Birds-200-2011 Data sets , Is a database of birds provided by the California Institute of technology , contain 200 Of birds 11,788 Zhang image . In use, it is usually divided into training sets （100 Kind of ）, Verification set （50 Kind of ） And test set （50 Kind of ）, The image size is uniformly cut to 84*84.
reference ：Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds- 200-2011 dataset. 2011.
Download address ：https://pan.baidu.com/s/1DEmLxePvDuJX1goSzM9r6Q
Extraction code ：f1l5

5. CIFAR-FS

CIFAR-FS The full name of the dataset is CIFAR100 Few-Shots Data sets , It comes from CIFAR 100 Data sets , contain 100 Category , Each category 600 Zhang image , total 60,000 Zhang image . In use, it is usually divided into training sets （64 Kind of ）, Verification set （16 Kind of ） And test set （20 Kind of ）, The image size is unified as 32*32.
reference ：Bertinetto L, Henriques J F, Torr P H S, et al. Meta-learning with differentiable closed-form solvers[J]. arXiv preprint arXiv:1805.08136, 2018.
Download address ：https://pan.baidu.com/s/1HqRUw3dmsMBInt_Fh3J_Uw
Extraction code ：ub38

6. ImageNet-1K Challenge

ImageNet-1K Challenge Data sets are also from ImageNet Data sets , Yes inclusive 1000 Category . In use, it is usually divided into basic data sets （389 Categories ） And new sample datasets （611 Kind of ）.
reference ：Hariharan B, Girshick R. Low-shot visual recognition by shrinking and hallucinating features[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 3018-3027.
Download address ：http://www.image-net.org/

7. FC100

FC100 The full name of the dataset is Few-shot CIFAR100 Data sets , With the above CIFAR-FS Data sets are similar to , Also from CIFAR100 Data sets , contain 100 Category , Each category 600 Zhang image , total 60,000 Zhang image . But the difference is FC100 Not by category （Class） Divided , But according to superclass （Superclass） Divided . contain 20 A superclass （60 Categories ）, One of the training sets 12 A superclass , Verification set 4 A superclass （20 Categories ）, Test set 4 A superclass （20 Categories ）.
reference ：Oreshkin B, López P R, Lacoste A. Tadam: Task dependent adaptive metric for improved few-shot learning[C]//Advances in Neural Information Processing Systems. 2018: 721-731.
Download address ：https://pan.baidu.com/s/1Wnlp1-obKsMLcHITYQ1CLg
Extraction code ：kcd6

Summary table of small sample data set

Small sample data set	source	Number of categories	Number of pictures	Picture size
Omniglot	-	1623	32,460	28*28
miniImageNet	ImageNet	100	60,000	84*84
tieredImageNet	ImageNet	608	779,165	84*84
ImageNet 1K	ImageNet	1000	-	-
CIFAR-FS	CIFAR 100	100	60,000	32*32
FC100	CIFAR 100	100	60,000	32*32
CUB-200	-	200	11,788	84*84

8.FewRel Data sets

Relation extraction data set released by Tsinghua University RewRel, The dataset contains 100 individual Relation,44800 individual Instance（ The sentence ）, Belongs to a supervised data set .

Download address ：https://thunlp.github.io/fewrel.html

GitHub Address ：https://github.com/thunlp/FewRel