当前位置:网站首页>On node embedding
On node embedding
2022-07-24 13:11:00 【aelum】
One 、DeepWalk
The origin of the paper :https://arxiv.org/pdf/1403.6652.pdf
Algorithm :

DeepWalk The idea is : For each node in the graph , We should all start from it γ \gamma γ A random walk sequence with maximum length , In the end ∣ V ∣ ⋅ γ |V|\cdot \gamma ∣V∣⋅γ A sequence of . Think of each sequence as a sentence , Regard the nodes as words , So we can apply to these sequences Word2Vec(Skip-Gram + Hierachical Softmax) Get the embedded representation of the node .
In the diagram above , w w w Is the window size , That is to say Skip-Gram In the model , Predict with the central word 2 w 2w 2w A context word ; d d d Is the dimension of the node vector ; γ \gamma γ For each node in the graph , How many random walk sequences should we generate from it ; t t t Is the maximum length of the random walk sequence .
1.1 Simple implementation
Take an undirected graph as an example , Suppose that all nodes are identified by integer numbers , The corresponding implementation is as follows
import random
from gensim.models import Word2Vec
def random_walk(graph, root, walk_length):
# graph by networkx Undirected graph type in ,root Is an integer
# This function generates only one random walk sequence at a time
seq = [root]
for _ in range(walk_length - 1):
seq.append(random.choice(list(graph.neighbors(seq[-1]))))
return seq
def deep_walk(graph, windows_size, embed_size, gamma, walk_length):
seqs = [random_walk(graph, node, walk_length) for node in graph.nodes for _ in range(gamma)]
model = Word2Vec(seqs, vector_size=embed_size, window=windows_size, sg=1, hs=1)
return model.wv
Two 、Node2Vec
The origin of the paper :https://arxiv.org/pdf/1607.00653.pdf
DeepWalk In fact, it means Word2Vec Used on the map ,Node2Vec stay DeepWalk On the basis of that, we improved , Walk completely random ( Suppose a node has N N N Neighbor nodes , Then the probability of next step to each neighbor node is 1 / N 1/N 1/N) Changed to biased random walk ( The probability of next step to each neighbor node is different ).
Here's the picture ,Node2Vec Second order random walk is used , That is, where to go next depends not only on the current node , It also depends on the previous node :

Parameters p p p Determines the probability of going back , Parameters q q q Decided to prefer BFS still DFS.
In particular , Set the node of the previous time step as t t t, The node of the current time step is v v v, The node of the next time step is x x x, be
P ( x ∣ v ) = { π v x / Z , ( v , x ) ∈ E 0 , o t h e r s P(x|v)=\begin{cases} \pi_{vx}/Z,&(v,x)\in E \\ 0,&others\\ \end{cases} P(x∣v)={ πvx/Z,0,(v,x)∈Eothers
among π v x \pi_{vx} πvx yes v → x v\to x v→x Preprocessing transition probability ( It's not probability ), Z Z Z It's the normalization factor ( Is used to π v x \pi_{vx} πvx Become probability ), E E E It's an edge set . set up w v x w_{vx} wvx Represents the edge ( v , x ) (v,x) (v,x) The weight of ( There are w v x = 1 w_{vx}=1 wvx=1), be π v x \pi_{vx} πvx Defined as π v x = w v x ⋅ α p q ( t , x ) \pi_{vx}=w_{vx}\cdot\alpha_{pq}(t,x) πvx=wvx⋅αpq(t,x), among
α p q ( t , x ) = { 1 / p , d t x = 0 1 , d t x = 1 1 / q , d t x = 2 \alpha_{pq}(t,x)= \begin{cases} 1/p,&d_{tx}=0\\ 1,&d_{tx}=1 \\ 1/q,&d_{tx}=2 \\ \end{cases} αpq(t,x)=⎩⎨⎧1/p,1,1/q,dtx=0dtx=1dtx=2
It can be seen that , When π v x = 1 \pi_{vx}=1 πvx=1 when ,Node2Vec It degenerates into DeepWalk.
Algorithm :

You can see from the pseudo code that , This algorithm is similar to DeepWalk The main difference is the sampling strategy of nodes . because DeepWalk It's a completely random walk , So each time you just need to randomly select one of the neighbor nodes OK 了 , and Node2Vec Is biased random walk , So you need to use Alias Sampling technology , Its time complexity is O ( 1 ) O(1) O(1)( Space for time ).
of Alias Sampling May refer to :https://www.keithschwarz.com/darts-dice-coins/.
2.1 Simple implementation
At present, there are ready-made third-party libraries ( be based on NetworkX and Gensim), Project address :https://github.com/eliorc/node2vec.
When in use, it can be imported directly :
from node2vec import Node2Vec
An example :
G = nx.les_miserables_graph()
model = Node2Vec(G, dimensions=32, walk_length=3, num_walks=600, p=2, q=0.5, workers=4).fit(window=3,
min_count=1,
batch_words=4)
边栏推荐
- Introduction to encryption technology
- Take chef and ansible as examples to get started with server configuration
- How to mount NFS shares using autofs
- Redis (13) -- on master-slave replication of redis
- 如何画 贝赛尔曲线 以及 样条曲线?
- Modification of EAS login interface
- Prototype inheritance
- SSM在线租房售房平台多城市版本
- Teach you how to use power Bi to realize four kinds of visual charts
- [C language] detailed knowledge of document operation
猜你喜欢

Redis (13) -- on master-slave replication of redis
![[C language] detailed knowledge of document operation](/img/2a/f2976d80212d9a38ea0457916a1db9.png)
[C language] detailed knowledge of document operation

26. Reverse linked list II
EfficientFormer:轻量化ViT Backbone

【C语言】详细的文件操作相关知识

vscode配置用户代码片段(包括删除方法)

English语法_不定代词 - 概述

leetcode第 302 场周赛复盘
This is how the permission system is designed, yyds

Nearly 65billion pieces of personal information were illegally handled in seven years, and the investigation of didi network security review case was announced
随机推荐
EAS approval process related table
SSM online rental and sales platform multi city version
【论文阅读】TEMPORAL ENSEMBLING FOR SEMI-SUPERVISED LEARNING
Go deadlock problem
Generator and async solve asynchronous programming
class
English语法_不定代词 - 概述
关于如何提升TTL(UART)通信抗干扰——心得
深圳地铁12号线第二批工程验收通过 预计7月28日试运行
How can flinksql run in perjob mode on yarn? I submit tasks on SqlClient
35.8. string conversion integer (ATOI)
2022.07.15 暑假集训 个人排位赛(十)
Activity start (launchactivity/startactivity)_ (1)_ WMS of flow chart
Sort method -- bubble sort (use an array to sort a string of numbers from large to small or from small to large)
How to draw Bezier curve and spline curve?
I 用c I 实现 大顶堆
Teach you how to use power Bi to realize four kinds of visual charts
Learn the calculation method of quantile value in n minutes
SSM online campus album management platform
ESP32ADC