当前位置:网站首页>Free machine learning dataset website (6300+ dataset)
Free machine learning dataset website (6300+ dataset)
2022-06-26 13:45:00 【The star light blog in 2021 cloud computing top3】
Today, I'd like to share with you a free website for acquiring machine learning data sets :
Machine Learning Datasets | Papers With Code
Good news for students who have ideas but do not have data sets , The website is very simple , And all kinds of data sets that are generally available are provided in this book , We can make all kinds of images 、 A collection of data sets such as comments and point clouds .

CIFAR-10
from Krizhevsky Et al . stay Learning multi-layer features from micro imagesCIFAR -10 Data sets ( Canadian Institute of advanced studies ,10 Categories ) yes Tiny Images A subset of a dataset , from 60000 Zhang 32x32 Color image composition . These images are marked with 10 One of the four mutually exclusive categories : The plane 、 automobile ( But not a truck or pickup truck )、 bird 、 cat 、 deer 、 Dog 、 frog 、 Horse 、 Boats and trucks ( But not a pickup truck ). Each kind has 6000 Zhang image , Each kind has 5000 Training images and 1000 Test images .
The criteria for determining whether an image belongs to a certain category are as follows :
- The class name should be in “ What's in this picture ?” Top of the list of possible answers to questions .
- The image should be photo realistic . The labeler was instructed to refuse to draw a line .
- The image should contain only one of the objects referred to in this class Highlight examples . As long as the reporter still knows the identity of the object , Objects may be partially obscured or seen from an unusual angle .

Urban landscape
from Cordts Et al . stay For semantic city scene understanding Cityscapes Data setCityscapes It is a large database focusing on the semantic understanding of urban street view . It is divided into 8 Categories ( Plane 、 human beings 、 vehicle 、 Architecture 、 object 、 natural 、 Sky and void ) Of 30 Two categories provide semantics 、 Instance and dense pixel annotation . The data set consists of approximately 5000 A finely labeled image and 20000 A rough labeled image . In a few months 、 During the day and in good weather , stay 50 Cities captured data . It was originally recorded as a video , Therefore, the frame is manually selected to have the following characteristics : A large number of dynamic objects 、 Changing scene layout and changing background .
resources : A survey of deep learning techniques applied to semantic segmentation
Pennsylvania tree vault
from Mitchell P. Marcus Et al . stay Build a large annotated English corpus :Penn TreebankEnglish Penn Treebank ( PTB ) corpus , Especially with the Wall Street Journal (WSJ) The corresponding part of the corpus , It is one of the most well-known and commonly used corpora for evaluating sequence label models . This task includes annotating each word with a part of speech tag . In the most common segmentation of this corpus , from 0 To 18 Part of the is used for training (38 219 A sentence ,912 344 A sign ), from 19 To 21 The section of is used to verify (5 527 A sentence ,131 768 A sign ), from 22 To 24 Used for testing (5 462 A sentence ,129 654 A sign ). Corpora are also commonly used in character level and word level language modeling .
resources :Seq2Biseq: A bi-directional output recurrent neural network for sequence modeling
IMDb Movie reviews
from Andrew L. Maas Et al . stay Learn word vectors for emotion analysisIMDb Movie reviews The data set is a binary affective analysis data set , From the Internet Movie Database (IMDb) Of 50,000 Comments make up , Mark as positive or negative . The dataset contains an even number of positive and negative comments . Consider only highly polarized comments . Score for negative comments ≤4( Full marks 10), Positive comment scores ≥7( Full marks 10). Each film contains no more than comments 30 strip . The dataset contains other unlabeled data .
resources :Sentiment analysis | NLP-progress
Model network
Introduced by Wu et al . stay 3D ShapeNets in : The depth of the volume shape representsModelNet 40 data Set contains composite object point clouds . As the most widely used point cloud analysis benchmark ,ModelNet40 Because of its variety 、 Clear shape 、 Data sets are well structured and popular . The original ModelNet40 from 40 Categories ( Like a plane 、 automobile 、 plant , The lamp ), among 9,843 For training , rest 2,468 For testing . The corresponding point cloud data points are uniformly sampled from the mesh surface , Then it is further preprocessed by moving to the origin and scaling to a unit sphere .
resources : Geometric feedback network for point cloud classificationCARLA( Automobile learning action )
from Dosovitskiy Et al . stay CARLA: An open urban driving simulatorCARLA(CAR Learning to Act) Is an open urban driving simulator , As Unreal Engine 4 And an open source layer on the . Technically speaking , It works in a way similar to Unreal Engine 4 An open source layer on , Sensors are provided in the following form RGB camera ( Customizable location )、 Actual ground depth map 、 have 12 One for driving ( road 、 Lane markings 、 traffic sign 、 Sidewalk, etc ) The design of the semantic categories of the ground live semantic segmentation map 、 The bounding box of dynamic objects in the environment , And the measurement of the agent itself ( Vehicle position and direction ).
resources : Synthetic data for deep learning
The above is a brief introduction to several commonly used data sets , Please go to the website to get more data .
边栏推荐
- 嵌入式virlog代码运行流程
- Beifu twincat3 can read and write CSV and txt files
- Guruiwat rushed to the Hong Kong stock exchange for listing: set "multiple firsts" and obtained an investment of 900million yuan from IDG capital
- 7-2 a Fu the thief
- 【Proteus仿真】Arduino UNO按键启停 + PWM 调速控制直流电机转速
- awk工具
- character constants
- 去某东面试遇到并发编程问题:如何安全地中断一个正在运行的线程
- 古瑞瓦特冲刺港交所上市:创下“多个第一”,获IDG资本9亿元投资
- 7-3 minimum toll
猜你喜欢

基于PyTorch的生成对抗网络实战(7)——利用Pytorch搭建SGAN(Semi-Supervised GAN)生成手写数字并分类

What is the use of index aliases in ES

8.Ribbon负载均衡服务调用

Range of types
![[how to connect the network] Chapter 2 (Part 1): establish a connection, transmit data, and disconnect](/img/e3/a666ba2f48e8edcc7db80503a6156d.png)
[how to connect the network] Chapter 2 (Part 1): establish a connection, transmit data, and disconnect

Tips for using nexys A7 development board resources

Included angle of 3D vector
![[how to connect the network] Chapter 2 (next): receiving a network packet](/img/f5/33e1fd8636fcc80430b3860d069866.png)
[how to connect the network] Chapter 2 (next): receiving a network packet

ES中索引别名(alias)的到底有什么用

Mediapipe gestures (hands)
随机推荐
33、使用RGBD相机进行目标检测和深度信息输出
GO语言-管道channel
[how to connect the network] Chapter 1: the browser generates messages
H5 video automatic playback and circular playback
Bifu divides EtherCAT module into multiple synchronization units for operation -- use of sync units
Mysql database explanation (V)
[node.js] MySQL module
Embedded virlog code running process
DataGrip配置的连接迁移
Custom encapsulation drop-down component
KITTI Detection dataset whose format is letf_ top_ right_ bottom to JDE normalied xc_ yc_ w_ h
7-16 monetary system I
Basic type of typescript
【HCSD应用开发实训营】一行代码秒上云评测文章—实验过程心得
d检查类型是指针
Exercise set 1
Beifu cx5130 card replacement and transfer of existing authorization files
mysql讲解(一)
GC is not used in D
Beifu PLC model selection -- how to see whether the motor is a multi turn absolute value encoder or a single turn absolute value encoder