当前位置:网站首页>On node embedding
On node embedding
2022-07-24 13:11:00 【aelum】
One 、DeepWalk
The origin of the paper :https://arxiv.org/pdf/1403.6652.pdf
Algorithm :

DeepWalk The idea is : For each node in the graph , We should all start from it γ \gamma γ A random walk sequence with maximum length , In the end ∣ V ∣ ⋅ γ |V|\cdot \gamma ∣V∣⋅γ A sequence of . Think of each sequence as a sentence , Regard the nodes as words , So we can apply to these sequences Word2Vec(Skip-Gram + Hierachical Softmax) Get the embedded representation of the node .
In the diagram above , w w w Is the window size , That is to say Skip-Gram In the model , Predict with the central word 2 w 2w 2w A context word ; d d d Is the dimension of the node vector ; γ \gamma γ For each node in the graph , How many random walk sequences should we generate from it ; t t t Is the maximum length of the random walk sequence .
1.1 Simple implementation
Take an undirected graph as an example , Suppose that all nodes are identified by integer numbers , The corresponding implementation is as follows
import random
from gensim.models import Word2Vec
def random_walk(graph, root, walk_length):
# graph by networkx Undirected graph type in ,root Is an integer
# This function generates only one random walk sequence at a time
seq = [root]
for _ in range(walk_length - 1):
seq.append(random.choice(list(graph.neighbors(seq[-1]))))
return seq
def deep_walk(graph, windows_size, embed_size, gamma, walk_length):
seqs = [random_walk(graph, node, walk_length) for node in graph.nodes for _ in range(gamma)]
model = Word2Vec(seqs, vector_size=embed_size, window=windows_size, sg=1, hs=1)
return model.wv
Two 、Node2Vec
The origin of the paper :https://arxiv.org/pdf/1607.00653.pdf
DeepWalk In fact, it means Word2Vec Used on the map ,Node2Vec stay DeepWalk On the basis of that, we improved , Walk completely random ( Suppose a node has N N N Neighbor nodes , Then the probability of next step to each neighbor node is 1 / N 1/N 1/N) Changed to biased random walk ( The probability of next step to each neighbor node is different ).
Here's the picture ,Node2Vec Second order random walk is used , That is, where to go next depends not only on the current node , It also depends on the previous node :

Parameters p p p Determines the probability of going back , Parameters q q q Decided to prefer BFS still DFS.
In particular , Set the node of the previous time step as t t t, The node of the current time step is v v v, The node of the next time step is x x x, be
P ( x ∣ v ) = { π v x / Z , ( v , x ) ∈ E 0 , o t h e r s P(x|v)=\begin{cases} \pi_{vx}/Z,&(v,x)\in E \\ 0,&others\\ \end{cases} P(x∣v)={ πvx/Z,0,(v,x)∈Eothers
among π v x \pi_{vx} πvx yes v → x v\to x v→x Preprocessing transition probability ( It's not probability ), Z Z Z It's the normalization factor ( Is used to π v x \pi_{vx} πvx Become probability ), E E E It's an edge set . set up w v x w_{vx} wvx Represents the edge ( v , x ) (v,x) (v,x) The weight of ( There are w v x = 1 w_{vx}=1 wvx=1), be π v x \pi_{vx} πvx Defined as π v x = w v x ⋅ α p q ( t , x ) \pi_{vx}=w_{vx}\cdot\alpha_{pq}(t,x) πvx=wvx⋅αpq(t,x), among
α p q ( t , x ) = { 1 / p , d t x = 0 1 , d t x = 1 1 / q , d t x = 2 \alpha_{pq}(t,x)= \begin{cases} 1/p,&d_{tx}=0\\ 1,&d_{tx}=1 \\ 1/q,&d_{tx}=2 \\ \end{cases} αpq(t,x)=⎩⎨⎧1/p,1,1/q,dtx=0dtx=1dtx=2
It can be seen that , When π v x = 1 \pi_{vx}=1 πvx=1 when ,Node2Vec It degenerates into DeepWalk.
Algorithm :

You can see from the pseudo code that , This algorithm is similar to DeepWalk The main difference is the sampling strategy of nodes . because DeepWalk It's a completely random walk , So each time you just need to randomly select one of the neighbor nodes OK 了 , and Node2Vec Is biased random walk , So you need to use Alias Sampling technology , Its time complexity is O ( 1 ) O(1) O(1)( Space for time ).
of Alias Sampling May refer to :https://www.keithschwarz.com/darts-dice-coins/.
2.1 Simple implementation
At present, there are ready-made third-party libraries ( be based on NetworkX and Gensim), Project address :https://github.com/eliorc/node2vec.
When in use, it can be imported directly :
from node2vec import Node2Vec
An example :
G = nx.les_miserables_graph()
model = Node2Vec(G, dimensions=32, walk_length=3, num_walks=600, p=2, q=0.5, workers=4).fit(window=3,
min_count=1,
batch_words=4)
边栏推荐
- Vscode configuration user code snippet (including deletion method)
- Wang Ping, co-founder of Denglin Technology: Innovation + self research "dual core" drive, gpu+ enabling AI takes root | quantum bit · viewpoint sharing review
- Localstorage
- 权限系统就该这么设计,yyds
- 基于matlab的语音处理
- This is how the permission system is designed, yyds
- C代码规范
- 登临科技联合创始人王平:创新+自研“双核”驱动,GPU+赋能AI落地生根|量子位·视点分享回顾...
- Deep and shallow copies of objects, extends
- setAttribute、getAttribute、removeAttribute
猜你喜欢

Leetcode's 302 weekly rematch

LeadTools 22 kit LeadTools super set

Finclip's "applet export app" function has been updated again by the company

Redis (13) -- on master-slave replication of redis

I realize large top stack with C I

Usage of swipemenurecyclerview

Search engine based on boost library
This is how the permission system is designed, yyds

Copy of array and array address value

How to draw Bezier curve and spline curve?
随机推荐
基于matlab的语音处理
Inversion of array (output in reverse order) (define an array and assign a value to output the array in reverse order)
Get the current month and year and the previous 11 months
SSM在线考试系统含文档
FinClip 「小程序导出 App 」功能又双叒叕更新了
leetcode第 302 场周赛复盘
Redis (13) -- on master-slave replication of redis
Use of PageHelper
Constraintlayout learn from 0 to 0.n
Set up CI server with Jenkins
About packaging objects
EfficientFormer:轻量化ViT Backbone
How to draw Bezier curve and spline curve?
28. Rainwater connection
LEADTOOLS 22 套件 LEADTOOLS 超级套
【论文阅读】Mean teachers are better role models
Deep and shallow copies of objects, extends
[datasheet] PHY ksz9031 gigabit network chip interpretation
SSM在线租房售房平台多城市版本
Windivert:可抓包,修改包