当前位置:网站首页>Pyg tutorial (4): Customizing datasets
Pyg tutorial (4): Customizing datasets
2022-06-21 06:44:00 【Si Xi is towering】
One . Preface
stay PyG in , In addition to directly using its own benchmark Outside the data set , Users can also customize data sets , Its way and Pytorch similar , You need to inherit the dataset class .PyG Two dataset abstract classes are provided in :
torch_geometric.data.Dataset: For building large datasets ( Non memory dataset );torch_geometric.data.InMemoryDataset: Used to build memory data sets ( Small data set, ), Inherited fromDataset.
The following is a detailed introduction .
Two . Memory datasets
2.1 Create instructions
stay PyG To build your own memory data set, you need to inherit InMemoryDataset class , And implement the following methods :
raw_file_names(): Returns a list of file names for the original dataset , ifself.raw_dirThere are no files in this list in , Will passdownload()Download ;processed_file_names(): returnprocess()File name list after method processing , ifself.processed_dirThere are no files in this list , You have to go throughprocess()Methods to deal with ;download(): Download the original dataset toself.raw_dirin ;process(): Processing raw data sets , And save toprocessed_dirin .
In the first two methods , If there is only a single file , Directly return the file string , You don't have to return list object .
in addition , above self.raw_dir and self.processed_dir There are actually two ways , Its source code is :
# add @property, You can make methods called like properties
@property
def raw_dir(self) -> str:
return osp.join(self.root, 'raw')
@property
def processed_dir(self) -> str:
return osp.join(self.root, 'processed')
You can see from the source code ,self.raw_dir and self.processed_dir Is the given save path root The path of the original data folder and the processed data folder .
2.2 Create a presentation
This article takes SNAP A social network in the data set Facebook For example , To demonstrate how to create a InMemoryDataset Data sets FaceBook, The dataset contains 4039 Nodes 、88234 side . utilize Gephi Visualize the network as follows :

according to 3.1 Description in section , Here's the custom FaceBook Class source code :
import os
import pandas as pd
import torch
from torch_geometric.data import Data
from torch_geometric.data import InMemoryDataset, download_url, extract_gz
class FaceBook(InMemoryDataset):
url = "https://snap.stanford.edu/data/facebook_combined.txt.gz"
def __init__(self,
root,
transform=None,
pre_transform=None,
pre_filter=None):
super().__init__(root, transform, pre_transform, pre_filter)
self.data, self.slices = torch.load(self.processed_paths[0])
@property
def raw_file_names(self):
return ["facebook_combined.txt"]
@property
def processed_file_names(self):
return "data.pt"
def download(self):
path = download_url(self.url, self.raw_dir)
extract_gz(path, self.raw_dir)
def process(self):
# Load raw data file
path = os.path.join(self.raw_dir, "facebook_combined.txt")
edges = pd.read_csv(path, header=None,
delimiter=" ").values.reshape(2, -1)
# structure Data object
edge_index = torch.from_numpy(edges)
g = Data(edge_index=edge_index, num_nodes=4039)
data, slices = self.collate([g])
torch.save((data, slices), self.processed_paths[0])
if __name__ == "__main__":
dataset = FaceBook(root="tmp")
data = dataset[0]
print(data.num_edges, data.num_nodes)
# 88234 4039
It should be noted that
downloadandprocessOnly when called for the first time , After that, the processed data set will be loaded directly .- above 4 A way Not all are needed , For example, if you already have a local dataset , There is no need to rewrite
download()Function to download the original data set .
3、 ... and . Large data sets
For large graph datasets , Need to inherit Dataset class , except InMemoryDataset Need to be rewritten in 4 There are two ways , You also need to rewrite the following method :
len(): Returns the number of instances in the dataset ;get(): Load the logic of a single graph .
Due to custom large datasets and InMemoryDataset similar , Brief presentation .
Four . Conclusion
Reference material :
Customizing datasets is an important thing , Especially when you need to convert some local data to PyG In the standard graph dataset .
边栏推荐
- Answer the question: what do you think AgI should adopt?
- 154 Solana distribution token
- Zongzi battle - guess who can win
- Argo CD 使用
- [MySQL] database function clearance Tutorial Part I (aggregation, mathematics, string, date, control flow function)
- 【查询数据表中第三行数据】
- Sqlmap工具
- Markdown mathematical grammar [detailed summary]
- 出现ConcurrentModificationException这个异常报错,怎么处理?
- Dynamic planning exercises (II)
猜你喜欢

浅了解泛型机制

Argo CD 使用

【JDBC从入门到实战】JDBC基础通关教程(全面总结上篇)

How powerful are spectral graph neural networks

Recursively establish a chained binary tree, complete the traversal of the first, middle and last order and other functions (with source code)

海明码校验【简单详细】

Issue 7: roll inside and lie flat. How do you choose

C语言实现模拟银行存取款管理系统课程设计(纯C语言版)

Docker installing MySQL

Modbus poll v9.9.2 build 1690 MODBUS test tool single file version
随机推荐
Is today's mathematics too cumbersome?
0-1 knapsack problem (violent recursion / dynamic programming)
糖果隧道js特效代码
scikit-learn中的Scaler
Fastdfs cluster
如何通过JDBC访问MySQL数据库?手把手实现登录界面(图解+完整代码)
使用cell ranger进行单细胞转录组定量分析
MySQL使用什么作为主键比较好
动态规划习题(二)
Pychart sets the default interpreter for the project
麦克风loading动画效果
Regular expression Basics
数据可视化实战:数据处理
Why should I use the source code of nametuple replace(‘,‘, ‘ ‘). Split() instead of split(‘,‘)
【JDBC從入門到實戰】JDBC基礎通關教程(全面總結上篇)
Unity hidden directories and hidden files
Required Integer parameter ‘XXX‘ is not present
Sqlmap tool
TweenMax不规则几何图形背景带动画js特效
College entrance examination