当前位置：网站首页>ESWC 2018 | r-gcn: relational data modeling based on graph convolution network

ESWC 2018 | r-gcn: relational data modeling based on graph convolution network

2022-07-25 04:35:00 【Cyril_ KI】

Catalog

Preface
1. Graph convolution network
2. Regularization
3. experiment
- 3.1 Node classification
- 3.2 Link prediction

Preface

Insert picture description here
subject ： Modeling Relational Data with Graph Convolutional Networks
meeting ： Extended Semantic Web Conference, 2018
Address of thesis ：Modeling Relational Data with Graph Convolutional Networks

This article is GCN The author of Kipf Following GCN The latter work ,GCN There are two obvious problems ：

Can only deal with undirected graphs
Can only deal with homogeneous graphs , That is, only edges of the same type can be processed .

R-GCN As GCN The follow-up work of , Its main contribution is to GCN It is introduced into multi relation heterogeneous graph , in other words R-GCN When updating node features, we can consider the characteristics of different types of edge nodes .

1. Graph convolution network

Definition of terms ： The Internet $G=(\mathcal{V},\mathcal{E}, \mathcal{R})$ , Where nodes $v_i \in \mathcal{V}$ , edge $(v_i,r,v_j) \in \mathcal{E}$ , among $\in \mathcal{R}$ Indicates the type of edge .

GCN and R-GCN Both use graph convolution to update node status , The two can be unified with the following framework ：
Insert picture description here
here $h_i^{(l)}$ Representation node $v_i$ The first $l$ The hidden state of a layer , $g_m(.,.)$ It can be understood as the message function mentioned in the previous article , $\mathcal{M}_i$ Refers to the node $v_i$ Incoming message collection of , $\sigma$ Is the activation function .

about GCN Speaking of , $g_m(.,.)$ It means multiplying the characteristics of neighbor nodes by the normalized weight coefficient , here GCN The type information of nodes is not considered , Because all nodes belong to the same type .

about R-GCN Speaking of , A key problem is how to consider the differences between different types of nodes in the convolution process , That is, how to interact among multiple relationships . For different types of relationships in the diagram ,R-GCN Here's how ：
Insert picture description here
among ：

$\mathcal{N}_i^r$ ： node $v_i$ The relationship is $r$ Set of neighbor nodes . For example, for a reference network , The relationship between the author node and other nodes may be “ The author writes a thesis ”、“ The author belongs to an organization ” wait .
$c_i,r$ Is a normalization coefficient , It can be set as a learnable parameter or a constant , for example $|\mathcal{N}_i^r|$ .
$W_r^{(l)}$ Is a linear transformation function , Observation subscript $r$ We can know , Each type of relationship has its own linear function , They are responsible for transforming the characteristics of neighbor nodes on the edge of the corresponding relationship .

By observing the above formula, we can find ,R-GCN After aggregating the node characteristics of different relationships , You also need to add the characteristics of your own nodes , Finally, an activation function is used to get the updated node characteristics .

R-GCN And GCN The big difference is this R-GCN Several linear transformation functions are introduced to transform various types of relational nodes , and GCN There is only one type of relationship in , That is, there is only one linear transformation function .

R-GCN The calculation diagram of single node update in is shown below ：
Insert picture description here
The red node indicates the node to be updated , The dark blue node indicates the neighbor node of the node to be updated , They are divided into different groups according to their relationships , At the same time, the nodes in each group are divided into internal relationship nodes and external relationship nodes according to the direction of the edge . The state of the dark blue node is transformed into a green node through the transformation function , Then gather together ( Due to the addition of self-loop, The characteristics of the red node itself are also taken into account ). Last , The aggregated features get the updated state of the red node through an activation function .

2. Regularization

R-GCN You need to specify a conversion function for each type of edge $W$ , If a network has many kinds of relationships , that R-GCN The number of parameters in will also increase dramatically , Cause huge computational overhead .

To solve this problem , An intuitive idea is that conversion functions with different types of relationships share parameters , However, if the parameters are fully shared R-GCN Will degenerate into GCN. So , The author puts forward basis function decomposition, That is, basis function decomposition ：
Insert picture description here
It can be seen from the decomposition formula of the basis function , Different relationships $r$ The transformation function of is obtained by a linear combination , The first $l$ Of transformation functions with different relationships in the layer $V_b^{(l)}$ identical , Just the combination coefficient $a_{rb}^{(l)}$ Different , This greatly reduces the number of parameters .

The author also puts forward block-diagonal decomposition, Block diagonal decomposition method . Specifically speaking, it is ：
Insert picture description here
You can find , Each linear transformation function passes through a set of low dimensional matrices $Q_{br}$ To define , namely ：

Basis function decomposition can be regarded as a form of effective weight sharing between different relationship types , The block diagonal decomposition can be regarded as the sparsity constraint on the weight matrix of each relationship type . The block diagonal decomposition structure encodes an intuition , That is, potential features can be divided into a set of variables , These variables are more closely coupled within groups than between groups . Both of these decompositions reduce highly multi relational data ( Such as the real knowledge base ) The number of parameters to learn .

3. experiment

3.1 Node classification

Node classification can be summarized as follows ：
Insert picture description here
Simply put, by stacking multiple R-GCN Layer to get the vector representation of nodes , And then through softmax Function gets the output , Finally, the model parameters are updated by calculating the cross entropy loss function , This is the same as mentioned earlier GNN The node classification model is consistent .

The cross entropy loss function is defined as ：
Insert picture description here
among $\mathcal{Y}$ Is the index set of labeled nodes , $h_{ik}^{(L)}$ Representation node $i$ Of the output of $k$ term , $t_{ik}$ Indicates the true value of the corresponding tag .

Entities of the dataset used in the experiment 、 Relationship 、 edge 、 Labels and categories are as follows ：
Insert picture description here
Baseline：RDF2Vec、Weisfeiler-Lehman kernels (WL) 、Feat（ A manually designed feature extractor ）.

experimental result ：
Insert picture description here
You can find ,R-GCN stay MUTAG and BGS Didn't do the best , The explanation given by the author is ：MUTAG and BGS There are many medium height nodes . In particular ： Normalization constant of message aggregation from adjacent nodes $1/c_{i, r}$ Fixed choice is part of the reason for this behavior , For height nodes , This can be particularly problematic . In the future work , A potential way to overcome this limitation is to introduce an attention mechanism , That is, use the attention weight of data dependence $a_{ij,r}$ Replace the normalization constant $1/c_{i, r}$ .

3.2 Link prediction

For link prediction, please refer to my previous article ：PyG build GCN Realize link prediction .

Link prediction model is divided into encoder and decoder , The encoder is R-GCN, That is, through R-GCN Encode the nodes to get a low dimensional vector representation , Then through the decoder DistMult That is, the score function gets the score between node vector pairs , Then the cross entropy loss function is obtained with the real sample , Finally, update the parameters .
Insert picture description here
Data sets ：

experimental result ：