当前位置:网站首页>[paper reading | depth] role based network embedding via structural features reconstruction with degree regulated

[paper reading | depth] role based network embedding via structural features reconstruction with degree regulated

2022-06-25 10:20:00 Haihong Pro

Preface

Hello!
Thank you very much for reading Haihong's article , If there are mistakes in the text , You are welcome to point out ~
 
Self introduction. ଘ(੭ˊᵕˋ)੭
nickname : Sea boom
label : Program the ape |C++ player | Student
brief introduction : because C Language and programming , Then I turned to computer science , Won a National Scholarship , I was lucky to win some national awards in the competition 、 Provincial award … It has been insured .
Learning experience : Solid foundation + Take more notes + Knock more code + Think more + Learn English well !
 
Only hard work
 

Know what it is Know why !

 
This article only records what you are interested in

ABSTRACT

Role-based network embedding aims to map the network into a low dimensional node representation , While maintaining structural similarity .

Adjacency matrix contains local information of the network , It also contains the global information of the network , But it can not directly express the role of nodes .

therefore , Extract high-order structural features from adjacency matrix For role-based network embedding ( Structural equivalence ) crucial .


Generally speaking , In real networks , Features extracted in the same way Sensitive to noise , But it doesn't really represent the role of nodes .

therefore , We propose a deep learning framework RESD

  1. First , We propose to extract the high-order structural features of each node in the network

  2. then , We use variational self coding (VAE) Modeling the nonlinear relationship of features , Reduce noise , Improve the robustness of embedding

  3. Besides , In embedded space , We apply degree regularization constraints to guide representation learning , To retain the key structural information of the node ( Rudu ), This information may be due to VAE The principles of the framework are lost

  4. Last , We construct a unified objective function , By retaining the structural features and node degree, we can learn the role discovery of node embedding

1. Introduction

Community based network embedding ( Preserve node proximity )

  • DeepWalk [14] and node2vec [8] , By random walk
  • GCN[15] and GraphSAGE[16], Spread features through the Internet

However , Role is another important point in complex networks , It is fundamentally different from the concept of community .


The definition of a role is defined as Classes of equivalent nodes [17], Used to represent the structural similarity between nodes .

let me put it another way , A role is a node partition that satisfies the following conditions : Nodes are structurally more similar to nodes in the same partition than to nodes outside the partition

Intuitively speaking , The role represents the connection mode of the nodes in the neighborhood .


 Insert picture description here

Pictured 1( Upper figure ) Shown , Two red nodes share the same role , Because they can be classified as bridges of two subgraphs , The Yellow node has the same role , Because they are all centers , Even if they are far away from each other or not connected

therefore , Role based network embedding is designed to project the network into the embedding space , In this space , Nodes of the same character are more closely connected .

in summary , Although these two problems ( Based on community or role ) They are all devoted to the clustering of nodes in the network , But they have essential differences and their respective priorities [18].

Community based network embedding methods usually can not solve role-based tasks , Because of them Unable to capture and model the structural features of nodes


lately , Some methods have been proposed to Get structural similarity and Get role based Network embedding .

One of the most representative methods is based on random walk .

The first challenge of character discovery is : Nodes in the same character may not be directly connected or away from each other . To solve this problem , Some methods improve the ordinary random walk .

for example ,role2vec[12] use Random walk based on feature Instead of based on id To generate a sequence as a random walk SkipGram[19] The input of .
However , It is still unable to handle the structural identity extraction of remote nodes , Can not be applied to large-scale networks .

Some ways to try Using adjacency matrix Instead of random walks, it can alleviate the problem of non connectivity between nodes of the same role .

However , The adjacency matrix of the network can contain first-order and higher-order relations , Can not directly express structural similarity .

Therefore, adjacency matrix is not suitable for simply generating role-based embedding .

then , Some methods turn to Structural feature extraction based on adjacency matrix For role discovery . for example ,refex[20] Is a recursive aggregation of neighborhood features and is based on egonet An effective method for characterizing . It has good performance , A relatively compact representation is obtained .

But as the number of iterations increases , Each node will encounter a higher-order neighborhood , this May cause over fitting , As a result, the features can not truly reflect the structural information .

RolX[21] and GLRD[22] utilize REFEX Feature and decompose the feature matrix to obtain a compact node representation .

However , Based on matrix factorization There are ways to High computational complexity The problem of . what's more , The features they use depend on neighborhood information , This makes them useful for Noise is very sensitive , And the embeddedness they generate loses some statistical characteristics of nodes , Such as node degree .

Inspired by deep learning ,DRNE[23] Using the layer normalized by node degree LSTM To recursively aggregate neighbor representations to obtain rule equivalence . This mechanism can effectively capture the structural similarity between remote nodes , But the noise is still ignored .

most important of all , To learn role-based node embedding , We have to The challenges of non connectivity and noise are solved while maintaining the statistical characteristics of nodes .


So , We propose a deep learning framework for structural feature reconstruction based on degree regularity constraints , Called role-based network embedding .

The framework consists of feature extraction 、 Embedding generation and topology feature preservation Three parts .

  • First, the high-order structural features of each node are extracted . In this paper , We choose REFEX As a sample feature extraction method , To improve its efficiency and effectiveness .

  • then , Using the variational automatic encoder on the characteristic matrix (VAE) To reduce noise , Get a more compact representation , And it can capture the nonlinear relationship between input and node embedding , It has good robustness , Avoid over fitting .

    • However ,VAE The coding process may lead to the loss of some key structural features of nodes .

    • therefore , stay VAE To maintain the degree of nodes , Play an auxiliary role in guiding the optimization direction .


Our main contribution :

  • This paper proposes a new framework for role-based network embedding through structural feature reconstruction . This process allows for a more compact representation of nodes , And improve the performance of the elements
  • We use the variational auto coder to capture the nonlinear relationship between input and node embedding , While maintaining structural similarity . It can also reduce the noise in the original network , Increase the robustness of the representation .
  • The coding process lost its structural features ( For example, degree ), We confirm that node degree is important for role assignment . therefore , We are VAE A degree regularization constraint is added to the , To maintain structural consistency and improve performance .
  • We use our model RESD Several extensive experiments have been carried out on real networks , The results are compared with other advanced methods . The experimental results prove the superiority of the model , It is proved that the model can well adapt to the changes of network scale and dimension .

2. Related work

Traditional research is generally Define a character through a graphical representation . One of the most popular methods is the block model , It builds a framework , Nodes represent roles , Edges represent the interaction between characters

  • Concor[25] Is the first person to recursively calculate the adjacency matrix correlation .

  • Random block model (SBM)[26] It's a classic way , It introduces random equivalence into role identity , therefore , If the probability of linking to all other nodes is the same , The node is in the same role

  • Then on SBM Improved , To enable them to better discover roles . for example , Mixed member random block model (MMSB)[27] Using the idea of block model , And assume that a node can belong to multiple roles .


Other traditional methods Using the mechanism of feature engineering , Shift focus from graphical representation to node representation .

They tried different metrics to evaluate some specific structural consistency of nodes , Such as degree centricity 、 Centrality of eigenvectors [28]、PageRank Centrality [29] And structural void value [30], These measures may show good results in some special tasks .

REFEX[20] It is an effective method to obtain structural features . It recursively aggregates neighborhood features and is based on eGonet Characteristics of , To get global structure information . But they are still difficult to apply to discovering node roles and complex tasks , Because the saved structure information is too limited


lately , In order to capture comprehensive role information and apply it to complex machine learning tasks , Researchers are trying to take advantage of network embeddedness . Depending on the embedding mechanism , Most of these role-based network embedding methods can be divided into three categories :

  • Language model based approach
  • Matrix factorization method
  • Deep learning method

1) Language model based on random walk Skip-Gram[19]

Classical network embedding method based on random walk , Such as DeepWalk[14] and node2vec[8], Are bound to the community , Because they're based on id The random walk of , But they Not applicable to role discovery .

Some methods try to improve random walk , To make it suitable for role detection .

  • The most effective model for such role discovery is struc2vec[31]. The algorithm uses dynamic time warping (DTW) Distance calculates multiple feature-based similarity between all node pairs , Then construct a complete graph hierarchy , The edge weight is transformed by feature-based similarity and used to calculate the probability of random walk .
  • lately ,role2vec[12] Based on Motif Random roaming is applied to the feature of ,
  • node2bit[32] The same idea is applied to the user stitching task , Temporary roaming is also used CTDNE[33].
  • RiWalk[34] A new random walk method is designed to approximate the graph kernel on a subgraph .

2) Matrix decomposition mechanism

Most of the existing methods construct Characteristic matrix or similarity matrix , And then through Matrix factorization computes compact nodal representations

  • RolX[21] Is the first to use this mechanism to generate role-based node embedding . It USES REFEX[20] Extracted structural features , Low dimensional nodal representation is obtained by matrix decomposition . Practice has proved , This model has great potential in role discovery .
  • GLRD[22] Taking advantage of these features
  • Regal[35] Medium xNetMF An eigenmatrix is also constructed and decomposed
  • Hone[36] By decomposing a set of Motif Matrix to generate higher-order network embedding . They can effectively extract the structural similarity between nodes .
  • Some methods use classical methods to calculate pairwise similarity .Reaction[37] application RoleSim[38] Matrix to obtain the similarity between each pair of nodes , To get the node representation
  • SEGK[39] The similarity matrix of graph kernel computing is decomposed .
  • GraphWave[40] The embedding is also generated by applying the graph diffusion kernel and treating it as a probability distribution on the graph .
  • Another job [41] Four methods based on nonnegative matrix factorization are designed (NMF) Multi layer network embedding algorithm , While maintaining the structural characteristics . But this method based on matrix decomposition has high computational and spatial complexity , Not applicable except for RolX Large scale network beyond .

3) Deep learning framework based on neural network , To capture the highly nonlinear relationship between network topology and node representation [43].

  • DRNE[23] Application layer normalized LSTM[44] To learn regular equivalence . The algorithm samples the neighborhood of each node and sorts by node degree , Then the representation of neighborhood is regarded as the regularization of multi-layer perceptron LSTM The input of , Reconstruct the node degree
  • Demo-Net[45] A degree specific GCN, It uses different parameters according to the node degree , But it is aimed at semi supervised conditions
  • DMER[46] Make use of adjacency GCN Encoder and REFEX[20] Functional automatic encoder
  • GAS[47] It also makes use of the GCN To generate node embeddedness

However , These deep learning methods are still sensitive to noise .

3. Method

3.1. Notions

G = { V , E } G = \{V, E\} G={ V,E}

N ( u ) = { u ∈ V ∣ ( u , v ) ∈ E } N(u)=\{u \in V| (u, v) \in E\} N(u)={ uV(u,v)E} : node u u u The neighborhood of

d ( u ) d(u) d(u): node u u u Degree

the one-hop egonet of node v i v_i vi: G i = { V ( g i ) , E ( g i ) } G_i = \{V(g_i), E(g_i)\} Gi={ V(gi),E(gi)}

  • V ( g i ) = { v i } ∪ { v j ∈ V ∣ ( v i , v j ) ∈ E } V(g_i)=\{v_i\} \cup \{v_j \in V| (v_i,v_j)\in E\} V(gi)={ vi}{ vjV(vi,vj)E}, And the summit v i v_i vi Set of connected nodes
  • E ( g i ) = { ( u , v ) ∈ E ∣ u , v ∈ V ( g i ) } E(g_i)=\{(u,v)\in E| u,v \in V(g_i)\} E(gi)={ (u,v)Eu,vV(gi)}: node v i v_i vi The edge set of vertices between its neighborhood set ( That is to say V ( g i ) V(g_i) V(gi) The subgraphs that make up the original graph )

X = R n × m \boldsymbol X= \mathbb{R}^{n \times m} X=Rn×m: Represents the extracted feature space ,m For the dimension of characteristics

Low dimensional vector space ( The goal is ) R k \mathbb{R}^k Rk, among k < < ∣ V ∣ k << |V| k<<V

 Insert picture description here

3.2. Model

RESD Frame diagram
 Insert picture description here
RESD There are three parts

  1. The process of character feature extraction
  2. A variational automatic encoder for combining features into node representations
  3. Node degree based regularization for enhancing the representation ability of learning embedding .

3.2.1. Feature extraction

If two nodes u u u and v v v Have similar structural identity , Then they tend to share the same roles .

Although the adjacency matrix of a graph contains all the topological information , But because adjacency matrix is usually sparse , The nodes may be far apart , Even disconnected , It is difficult to compare the structural consistency of two nodes .

therefore , Structural similarity cannot be learned directly from adjacency matrix , It is necessary to extract compact high-order structural features from adjacency relations .


Early work has shown that , many High order feature extraction is effective in role discovery , Based on Motif Characteristics of counting [12]、 Multi hop neighbor degrees [31] and REFEX[20]. All of these methods can serve as a framework RESD Feature extraction in

ad locum , We used refex[20]

Because of its efficiency and effectiveness , It aggregates simple local and egonet Features to generate high-order structural features

This method extracts one local feature and five egonet Characteristics of .

For the node v i v_i vi, The local characteristic is nodal degree d ( v i ) d(v_i) d(vi), be based on G i G_i Gi Of egonet The characteristics are defined as follows :

 Insert picture description here

3.2.2. Variational auto-encoder

Because there is a lot of noise in the real network , The extracted high-order structural features are often very sensitive to noise , Can not meet the needs of role discovery .

therefore , We use a variational auto encoder (VAE)[24] The structural features are encoded as nodes

VAE Reference material :
https://zhuanlan.zhihu.com/p/418352070
https://zhuanlan.zhihu.com/p/260627899#:~:text=%E5%8F%AF%E5%8F%98%E8%87%AA%E5%8A%A8%E7%BC%96%E7%A0%81%E5%99%A8%E8%83%BD,%E7%90%86%E6%83%B3%E8%BE%93%E5%87%BA%E7%9A%84%E9%97%B4%E9%9A%99%E3%80%82

VAE It can reduce noise , Capture the nonlinear relationship between input features . And ordinary automatic encoder (AE) comparison ,VAE Effectively The over fitting problem is avoided , Improved robustness of output embedding

We use a multilayer perceptron model instead of a graphical neural network (GNN) As an encoder

because GNN Always Smooth embedding The fact that the extracted feature is negated .


In our proposal RESD Model VAE part , Given the extracted node v j v_j vj Eigenvector of X j X_j Xj, Encoder The definition is as follows :

 Insert picture description here

  • W l W^l Wl and b l b^l bl It's No l Layer weights and offsets bias
  • L: Number of hidden layers in the encoder
  • tanh(·) Is a hyperbolic tangent activation function
  • h j 0 = X j h^0_j=X_j hj0=Xj: The input of the encoder

Let's assume that nodes v j v_j vj It means Z j Z_j Zj Obey Gauss distribution P j ( Z j ) \mathbb{P_j}(Z_j) Pj(Zj), among Z j Z_j Zj All elements of are independent , also P j ( Z j ) = N j ( μ j , d i a g ( σ j 2 ) ) \mathbb{P_j}(Z_j)={N_j}(μ_j, diag(σ^2_j)) Pj(Zj)=Nj(μj,diag(σj2))

The distribution can be learned by the following formula P j ( Z j ) \mathbb{P_j}(Z_j) Pj(Zj) Parameters in μ j μ_j μj and σ j σ_j σj

 Insert picture description here

  • among W μ 、 b μ 、 W σ W^μ、b^μ、W^σ WμbμWσ and b σ b^σ bσ The weights and offsets of the last layer of the encoder , The previous parameters are shared

We are right. σ σ σ Take the logarithm , To ensure that the standard deviation is always positive .

then , We can use the reparameterization technique [24] Get the node representation Zj, This allows us to calculate μ j μ_j μj and σ j σ_j σj Gradient of

 Insert picture description here
among , ⊙ ⊙ Is an element level product , ε ε ε Obey standard Gaussian distribution


about RESD Decoder part of , We also use the multilayer perceptron model :

 Insert picture description here

  • W ^ l \hat W^l W^l and b ^ l \hat b^l b^l It is the... In the decoder l l l Weight and deviation of hidden layer
  • h ^ j 0 = Z j \hat h^0_j = Z_j h^j0=Zj Input to the decoder
  • X ^ \hat X X^ Is the reconstructed characteristic matrix , among X ^ j = h ^ j L \hat X_j =\hat h^L_j X^j=h^jL

Last , We will VAE Loss is defined as :

 Insert picture description here

3.2.3. Regularization

Despite the above VAE The model has the ability to generate representations , But when estimating the distribution of input characteristics , it Some key structural information may be lost

therefore , In embedded information Adding constraints To preserve critical structural information is critical .


From the structure equivalence of traditional graph based role discovery methods 、 Rules such as equivalence can be inferred , Nodes that play the same role have similar degrees .

therefore , Degree is the most important structural feature of a node , And through many methods such as Demo-Net[45] and DRNE[23] It is verified by experiments

Be similar to DRNE[23],RESD A hierarchy based regularizer is defined as follows :

 Insert picture description here

  • among MLP(·) Is a modified linear unit active ReLU(·) Multi layer perceptron model .

We train our deep learning framework by jointly minimizing the loss of feature reconstruction and degree regularization constraints RESD, As shown below :

 Insert picture description here

  • among β Is the weight of the degree based regularizer

RESD The algorithm pseudo code is as follows :

 Insert picture description here
Personal understanding :

  • First, according to the adjacency matrix of the graph Use refex Extract feature matrix
  • Then traverse the nodes in the graph in turn
  • When traversing to each node :
    • First use the equation (3) Gain due to Encoder The resulting vector Z j Z_j Zj
    • Then use the equation (5) Calculate due to VAE The loss gained
    • Then use the equation (7) Calculate the total loss after adding the regular constraint
    • Take advantage of this total loss Back propagation Update parameters
  • Apply the trained parameters , Get the node representation matrix Z( Is the last vector space we need )

Z Z Z It is obtained by the encoder Z j Z_j Zj Composed of , That is to say The result of the encoder is the embedding in the low dimensional space we need

3.3. Computational complexity analysis

Given network G, set up

  • n Represents the number of nodes ,
  • e Indicates the number of sides ,
  • r Express ReFeX The number of characteristic aggregations of ,
  • m Represents the extracted feature matrix X Dimension of
  • k Indicates that the node is embedded Z Dimension of

ReFeX The whole process of O ( r m ( e + n m ) ) O(rm(e + nm)) O(rm(e+nm))

Because the calculation amount of each feature aggregation is greater than that of the previous aggregation .

Each node uses VAE Express , The calculation complexity is O ( m 2 k ) O(m^2k) O(m2k), Regularization takes time O ( k 2 ) O(k^2) O(k2).

therefore k < m k < m k<m The embedded generation over the entire network is calculated as O ( n m 2 k ) O(nm^2k) O(nm2k).

because r It's always small , And r < k < m ≪ n r < k < m≪n r<k<mn, therefore RESD The whole calculation of O ( r e m + n m 2 k ) O(rem+nm^2k) O(rem+nm2k).

therefore , our RESD Framework for spending O ( n m 2 k ) O(nm2k) O(nm2k) The large-scale network of time has great potential .

4. Experiments

4.1. Baselines

  • node2vec [8]
  • role2vec [12]
  • struc2vec [31]
  • RolX [21]
  • DRNE [23]
  • struc2gauss [48]

4.2. Datasets

Data sets ( Undirected power map )

 Insert picture description here

4.3. Experiment settings

In the part of feature extraction , We use ReFeX, Set up

  • The characteristic aggregation times are 3 Time
  • The number of boxes is set to 5 Time

about VAE The encoder and decoder of , We will

  • The number of hidden layers is set to 2

For degree based regularizers , We used a with ReLU Activate the 2 layer MLP Model

We use a learning rate of 0.001 Of Adam SGD Optimizer [52], Set the weight to 0.001 Of L2 Regularization to avoid over fitting .

For most 50 individual epoch, The batch size is set to 32.

In our later experiments , If not specified , The embedded dimension is set to 128,β Set to 0.6.

4.4. Role classification

We randomly input different proportions of data to train an embedded linear logistic regression classifier , The rest is for testing .

We run all baseline methods and RESD 20 Time , And calculate their average on each network F1 score .

To verify the validity of degree preserving constraints , We also ran a Without a degree based regularizer RESD variant , be called RES

The results are as follows :

 Insert picture description here

4.5. Bot detection

In this section , Our goal is to explore our model Can some rare roles in the network be detected .

We chose two Wikipedia chat networks where robots make up only a small part of the total users .

In these networks Detecting robots is very complicated .

  1. First generate a representation of each user
  2. For every robot , We Calculate the Euclidean distance between it and other users
  3. Last , We choose the one closest to each robot K Users , Calculate the average accuracy of the robot

It means something like this :
First of all, for each user( Include bot robot )
And then use the low dimensional representation of bot Find the one closest to it k individual user
Look at this. k individual user in bot What is the proportion of
If this ratio is very high It indicates that the degree of embedding in rare roles is very high ( The better the model )

The results are as follows

 Insert picture description here

  • On robot detection tasks K Accuracy of selected users
  • OT Indicates that the method cannot be used in 48 Get results within hours .

4.6. Parameter sensitivity and computational complexity

The weight of regularization β And the parameter sensitivity of embedding dimension

 Insert picture description here
Baseline approach and RESD Running time of

 Insert picture description here

  • Where the full value represents 48 No results within hours
  • The output embedding dimension of all these methods is set to 128

4.7. Visualization

Data set used : Brazil

utilize t-SNE Reduce the embedded dimension to 2 For visualization

  • Each node is represented as a point
  • The color indicates its role

Ideally , Points of the same color should be very close to each other , And the dots of different colors should be far away .

Visualization results :

 Insert picture description here

5. Conclusion

Insufficient :

  • The structural features in the model are too simple
  • The encoder ignores the network structure
  • This model can only be applied to undirected graphs without authority

In the future work , We plan to use graph neural networks (GNN) To encode features , At the same time, the complex relationships between nodes are extracted .

We also hope to design a novel deep learning framework , Produce more expressive structural features than handmade works .

As far as we know , Most role-based network embedding methods only focus on static networks , The additional attributes on nodes or edges are ignored . In order to better understand the impact of roles , We will further explore dynamic networks , Introduce attributes to learn more efficient representations , This may be an attractive job in the future .

After reading the summary

With the increase of reading papers The speed of reading papers is also faster

Not as strange as before

I still have a little understanding of some of the things mentioned in the article


The main idea of this article is :

  • Through the adjacency matrix Use Refex The extracted features
  • Re pass VAE Reduce the impact of noise Improve the robustness of the model
  • And because of the use of VAE Some structural information will be lost
  • Then the regular constraint is introduced : degree
  • A combination of the two It is convenient to better extract the high-order features of nodes

Conclusion

The article is only used as a personal study note , Record from 0 To 1 A process of

Hope to help you a little bit , If you have any mistakes, you are welcome to correct them

 Insert picture description here

原网站

版权声明
本文为[Haihong Pro]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/176/202206250934084370.html