当前位置:网站首页>Learning notes 2022 overview | automatic graph machine learning, describing AGML methods, libraries and directions

Learning notes 2022 overview | automatic graph machine learning, describing AGML methods, libraries and directions

2022-06-25 12:18:00 Yetingyun

List of articles

One 、 Abstract 、 key word

Graph machine learning has been widely studied in academia and industry . However , With Picture learning The upsurge of research and the emergence of a large number of emerging methods and technologies , For different graph related tasks , Manually design the optimal machine learning algorithm It's getting more and more difficult . To meet this challenge , To discover different graph related tasks / The optimal hyperparameter of the data and the neural network architecture are configured as the target Automated graph machine learning It is getting more and more attention from the research community . This paper extensively discusses the automatic graph machine learning methods , It mainly covers... For graph machine learning Superparameter optimization (HPO) And neural network architecture search (NAS). The existing libraries designed for graph machine learning and automated machine learning are briefly summarized , And further introduced their contribution to the world's first open source library for automated graph machine learning AutoGL. Finally, the author shared his views on the future research direction of automated graph machine learning . This paper is about machine learning of automatic graph Approaches, Libraries and Directions The first systematic and comprehensive discussion of .

key word : Figure machine learning 、 Figure neural network 、 Automatic machine learning 、 Neural network architecture search 、 Superparameter optimization

Two 、 introduction

Graph data Everywhere in our daily life . have access to graphs To simulate the complex relationships and dependencies between entities , From small molecules in proteins and particles in physical simulations to large state grid and global airlines . therefore , Graph machine learning has long been an important research direction in academia and industry [1]. especially , Network embedding [2]、[3]、[4]、[5] Sum graph neural network (GNN)[6]、[7]、[8] It has attracted more and more attention in the past decade . They have been successfully applied to recommendation systems [9]、[10]、 Fraud detection [11]、 Bioinformatics [12]、[13]、 physical simulation [14]、 Traffic forecast [15]、[16]、 Knowledge means [17] 、 Drug reuse [18]、[19] and Covid-19 Pandemic prediction [20].

Although graph machine learning algorithms are popular , But the existing research largely depends on Manual super parameter adjustment and network architecture design For optimal performance , When there are a large number of models for various graph related tasks , It takes expensive manpower . With GNN For example , only 2021 Year after year , At the top machine learning and Data Mining Conference 100 A new generic architecture , Not to mention interdisciplinary research designed for specific tasks . If you stick to the manual trial and error paradigm when designing the best algorithm for the target task , Will inevitably require more and more human and material resources . On the other hand , Automatic machine learning (AutoML) It has been widely studied , To reduce the human workload in developing and deploying machine learning models [21]、[22]. complete AutoML Pipelines have the potential to automate every step of machine learning , Include Automatic data collection and cleaning Automatic feature engineering as well as Automatic model selection and optimization etc. . Due to the popularity of the deep learning model , Superparameter optimization (HPO) [23 ]、[24]、[25]、[26] And neural architecture search (NAS) [27]、[28] Has been the most widely studied .AutoML In computer vision [32]、[33] There is almost no manual guidance in such fields , But has reached the human level of performance [29]、[30]、[31].

Automated graph machine learning Combined with the AutoML And the advantages of graph machine learning , It has naturally become a promising research direction to further improve the performance of graph learning models , It has attracted more and more community attention . This paper systematically summarizes the methods of automatic graph machine learning , Introduced the related public libraries And the author developed AutoGL, We also shared our views on the challenges of graph machine learning and the future research direction .

Focus on two main topics : Figure of machine learning HPO and NAS. about HPO, Focus on how to develop extensible methods . about NAS, Follow the methods of previous studies , from search space Search strategy and Performance estimation strategy Compare different methods in . This paper briefly discusses some recent automatic graph learning work , These work in the architecture pool 、 Structural learning 、 Accelerator and hardware / software co design have their own characteristics . Besides , Along the way, different ways to deal with AutoML Challenges in graph learning . Then review the libraries related to automated graph machine learning and discuss AutoGL, This is the first dedicated framework and open source library for automated graph machine learning . Emphasis on the AutoGL Of Design principles And briefly introduce its usage , These are designed for AutoML on Graphs The design of the . Last but not least , The paper points out that the graph HPO Sum graph NAS Potential research directions , Including but not limited to Extensibility Interpretability Out of distribution generalization Robustness and Hardware aware design etc. . The author team believes that their paper will greatly promote the research and application of graph machine learning in academia and industry .

The rest of the paper is organized : In the 2 In the festival , Through a brief introduction to graph machine learning and AutoML The basic principle and mathematical formula expression of , This paper introduces the basic and preparatory knowledge of automatic graph machine learning . In the 3 Section comprehensively discusses the problems based on HPO Graph machine learning method , In the 4 Section comprehensively discusses the problems based on NAS Graph machine learning method . then , In the 5.1 In the festival , The related libraries of graph machine learning and automated machine learning are summarized , And introduced in depth AutoGL, This is the open source library of the authors' college team , It is also the world's first open source library customized for automated graph machine learning . Last , In the 6 Section outlines future research opportunities , And in the first place 7 Section summarizes the full text .

3、 ... and 、AutoGL summary

AutoGL: Powerful open source library for automated graph machine learning

In order to fill the blank of automatic graph machine learning , Developed AutoGL frame .AutoGL The overall framework of is shown in the figure below .

The automatic graph machine learning pipeline is divided into five modules :

  • Automatic feature engineering
  • Neural architecture search
  • model training
  • Hyperparametric optimization and automatic integration

For each module , Both provide a large number of The most advanced algorithm Standardized base classes and advanced API, To customize easily and flexibly .AutoGL Kuji is based on PyTorch Geometric(PyG)[94], This is a widely used graph machine learning library .AutoGL It has the following key characteristics :

  • Open source : The code and detailed documentation are available online ;
  • Easy to use :AutoGL Designed to be user friendly . Users need less than 10 A line of code can be used for a quick AutoGL practice ;
  • Flexible extension : Modular design 、 Advanced base class API And the rich AutoGL Documentation allows flexibility 、 Easy custom extensions .

To make a long story short :AutoGL With modularization and object-oriented The way to design , Realize clear logic flow 、 Easy to use and flexible extensions . All disclosed to users API Are abstracted in a high-level way , To avoid repeated execution of the model 、 Algorithm and training / Evaluation protocol . Automatic feature engineering 、 Neural architecture search 、 model training 、 The five main modules of hyperparametric optimization and automatic integration take into account the unique characteristics of graph machine learning . Next , Elaborate on the detailed design of each module .

1. AutoGL Dataset

Introduce dataset management .AutoGL Dataset Currently based PyTorch Geometric Of Dataset, A common benchmark that supports node and graph classification , Including the latest Open Graph Benchmark [118]. The following table shows a complete list of datasets , Users can also easily customize data sets according to official documents .

say concretely ,AutoGL Provide widely used node classification data sets , Include Cora、CiteSeer、PubMed [119]、Amazon Computers、Amazon Photo、Coauthor CS、Coauthor Physics [120]、Reddit [121], as well as MUTAG [122]、 PROTEINS [123]、IMDB-B、IMDB-M、COLLAB [124] Isometric classification dataset . Also support from Open Graph Benchmark [118] Data set of . The above table summarizes the statistics for the supported datasets .

2. Auto Feature Engineering

Graph data First, it is processed by the automatic feature engineering module , Among them, various nodes、edges and graph-level Features can be added automatically 、 Compress or delete , To help facilitate the subsequent graph learning process . Topological characteristics of graphs It can also be extracted to make better use of the graph structure .

at present ,AutoGL Support 24 A characteristic engineering operation , It is abstracted into three categories : generator Selectors and Graph features . The generator aims to create new node and edge features based on the current node features and graph structure . The selector automatically filters and compresses features , To ensure they are compact and informative . Graph features focus on generating graph-level features .

The supported are summarized in the following table generator , Include Graphlets [125]、EigenGNN [126]、PageRank [127]、 Local degree profile 、 normalization 、 Single heat degree and single heat node ID. about Selectors , Support GBDT [128] and FilterConstant. It also supports an automated feature engineering approach DeepGL [129], It can be used as a generator , It can also be used as a selector . about Graph features ,Netlsd [130] With a group NetworkX [109] The graph feature extractor implemented in is wrapped , for example NxTransitivity、NxAverageClustering etc. .

It also provides convenience for wrappers, Support PyTorch Geometric [94] and NetworkX [109] Feature engineering operations in . If the method is not suitable for the actual task , By inheriting classes BaseGenerator、BaseSelector and BaseGraph or BaseFeatureEngineer Easily customize feature engineering methods .

stay AutoGL in , Neural architecture search (NAS) Aimed at Automatic construction of graph neural network . Various... Will be used NAS Method search for the best GNN Model to fit the current dataset . stay Neural Architecture Search Module , Developed Algorithm、GNNSpace and Estimator Sub module to further solve the search problem .GNNSpace The entire search is defined to determine which architectures should be evaluated next ,Estimator Used to derive the performance of the target architecture .

AutoGL Support a wide variety of NAS Model , Including algorithms specified for graph data , Such as AutoNE [39] and AutoGR [40] And random search [23]、Tree Parzen Estimator [24] And other general algorithms . Users can customize HPO Algorithm from BaseHPOptimizer Class inheritance .

4. ModelTraining

The module passes Model and Trainer Two functional sub modules deal with the task of graph machine learning Training and assessment The process . The model deals with graph machine learning models by defining learnable parameters and forward transfer ( for example GNN) The construction of .Trainer Control the optimization process of a given model . common An optimization method Packaged as advanced API, To provide a concise interface . It also supports more advanced training control and regularization methods in graph related tasks , for example Stop early and Weight falloff .

The model training module supports Node level and graph-level Mission , For example, node classification and graph classification . Support common node classification models such as GCN [43]、GAT [131] and GraphSAGE [121]、GIN [132], And pooling methods such as Top-K Pooling [133]. Users can inherit BaseModel Class to quickly implement its own graph model , And through inheritance BaseTrainer Add custom tasks or optimization methods .

5. Hyper-Parameter Optimization

Superparameter optimization (HPO) modular It aims to automatically search for the best super parameters of the specified model and training process , Including but not limited to architecture super parameters , Such as the number of layers 、 The dimension represented by the node 、 dropout rate 、 Activate the function and train the hyperparameters , For example, learning rate 、 Weight falloff 、epoch Count . Hyperparameters 、 Their types ( for example , Integers 、 Number or classification ) And feasible range can be easily set .

AutoGL Support a wide variety of HPO Algorithm , Include as Graph data Specified algorithm , Such as AutoNE [39] and AutoGR [40], And general algorithms , Such as random search [23], Trees Parzen estimator [24] etc. . The user can select from BaseHPOptimizer Class inheritance to customize HPO Algorithm .

This module can automatically integrate the optimized single model , Form a stronger final model . At present, two integration methods are adopted :voting and stacking.Voting Is a simple and powerful integration method , The output of each model can be directly averaged .Stacking Train another metamodel to combine the output of the model .AutoGL Support general linear model (GLM) And gradient elevator (GBM) As a metamodel .

6. AutoGL Solver

On top of the previously mentioned modules , Provides another advanced API Solver To control the entire automation map machine learning pipeline . stay Solver in , Systematically integrate the five modules , Form the final model .Solver Receive feature engineering module 、 Model list 、HPO Module and integration module as initialization parameters , To build an automatic graph learning pipeline . Given data sets and tasks ,Solver First, perform automatic feature engineering to clean up and expand the input data , Then use model training and HPO The module optimizes all given models . Last , The optimized optimal model will be determined by Auto Ensemble The modules are combined to form the final model .

Solver Also provide AutoGL Global control of the pipeline . for example , You can explicitly set Time budget To limit the maximum time cost , And training can be selected from common data set splitting or cross validation / Evaluation protocol .

Four 、AutoGL Future direction

It has been shown that AutoGL Such a library for automated graph machine learning , It's open source 、 Easy to use 、 It's flexible . The author team plans to support the following features in a short time :

  • Support large-scale graphs;
  • Handle more graph related tasks , For example, heterogeneous graph (heterogeneous graphs) And spatiotemporal maps (spatial-temporal);
  • Support more graph library backends, for example Deep Graph Library[95].

Extensibility :AutoML It has been successfully applied to various graph scenarios , however , about Scalability of large-scale graphs , There are still many future directions worth further study . One side , Although for large-scale graph machine learning HPO It has been preliminarily explored in the literature . Used in the model Bayesian optimization efficiency is limited . therefore , It will be interesting and challenging to explore how to reduce the computational cost to achieve fast hyperparametric optimization . On the other hand , Although applications involving large-scale graphs are very common in the real world , But for graph machine learning NAS However, the scalability of is seldom concerned by researchers , This leaves a lot of room for further exploration .

Interpretability : The existing automatic graph machine learning methods are mainly based on black box optimization . for example , It is not clear why some NAS Models can perform better than other models ,NAS The interpretability of the algorithm is still lack of systematic research work . On the interpretability of graph machine learning [135], And the hyperparametric optimization of interpretable graphs through the decorrelation of hyperparametric importance [40], There have been some preliminary studies . However , It is still very important to study the interpretability of automatic graph machine learning .

Out of distribution generalization : When applied to new graph datasets and tasks , It still requires a lot of manpower to build task - specific diagrams HPO Configuration and diagram NAS frame , Such as space and algorithm . The current picture HPO Configuration and NAS The generalization of the framework is limited , In particular, training and testing data come from different distributions [136]. Study the graph that can handle continuous and rapidly changing tasks HPO Sum graph NAS The distributed generalization ability of the algorithm will be a promising direction .

Robustness : because AutoML Many applications on the graph are risk sensitive , Such as finance and health care , Therefore, the robustness of the model is essential for practical use . Although there are some problems with graph machines ​​ Robustness of machine learning [137] A preliminary study of , However, how to extend these techniques to automatic graph machine learning has not been explored .

AutoML The graph model of : At present, the main focus is on how to AutoML Method extends to figure . Another direction , That is, use the diagram to help AutoML, It is also feasible and promising . for example , We can model the neural network as Directed acyclic graph (DAG) To analyze its structure [138]、[93] Or adopt GNN To promote NAS [90]、[139]、[140]、[141]. Finally hope Map and AutoML Form closer ties and further promote each other .

Hardware aware model : In order to further improve the scalability of automated graph machine learning , The hardware aware model may be a key step , Especially in Real industrial environment in . Hardware awareness graph model [142] And hardware awareness AutoML Model [143]、[144]、[145] Have been studied , But integrating these technologies is still at an early stage , And face major challenges .

Comprehensive evaluation agreement : at present , majority AutoML Figures are tested on small traditional benchmarks , For example, three citation graphs , namely Cora、CiteSeer and PubMed [119]. However , These benchmarks have been considered insufficient to compare Different graph machine learning models [146], Not to mention the AutoML. A more comprehensive evaluation protocol is needed , for example , In the recently proposed graph machine learning benchmark [37]、[147] On , Or need something like NAS-bench series [148] New dedicated map for AutoML The benchmark .

5、 ... and 、 summary

The workflow diagram below shows AutoGL Overall framework

AutoGL Node classification and graph classification have been tested on some benchmark datasets ( Mainly want to show AutoGL And its Main function modules Usage of , Rather than aiming to reach a new state-of-the-art level in benchmarking or to compare different algorithms ):

This paper discusses automated machine learning (AutoML) And graph machine learning method . How to develop graph hyperparametric optimization (HPO) And graph neural structure search (NAS) To facilitate automated graph machine learning . Then it introduces AutoGL, This is a special framework and open source library developed by the author's team for automated graph machine learning . Last but not least , The challenge of automated graph machine learning is pointed out , It also puts forward a promising direction worthy of further study .

A more detailed introduction , Recommend study papers and their citations 、Github Projects and official documents :

原网站

版权声明
本文为[Yetingyun]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/176/202206251146399910.html