当前位置：网站首页>[paper reading] gettext: trajectory flow map enhanced transformer for next POI recommendation

[paper reading] gettext: trajectory flow map enhanced transformer for next POI recommendation

2022-07-23 18:56:00 【EmoryHuang】

【 Paper reading 】GETNext: Trajectory Flow Map Enhanced Transformer for Next POI Recommendation

Preface

Next POI The recommendation is based on the current status and historical information of the user , Predict users' recent trends , Bring great value to users and service providers .

2022 year SIGIR A paper on ：GETNext: Trajectory Flow Map Enhanced Transformer for Next POI Recommendation

Problem description

Given size is $M$ Of users $U=\{u_1, u_2, \cdots,u_M\}$ And the size is $N$ Of POI aggregate $P=\{p_1, p_2, \cdots, p_N \}$ . among $p=\langle latitude,longitude,category,frequency \rangle$ Longitude respectively 、 latitude 、 Category and access frequency .

（check-in） One check-in It can be expressed as $q=\langle u,p,t \rangle \in U\times P\times T$ , The user $u$ stay $t$ Always visit the place $p$ .

For the current user $u$ , His trajectory is $S_u=(q_1,q_2,\cdots,q_m)$ , General , Our task is to predict the user's next visit location , namely next POI(Point-of-Interest) recommendation.

Overview

This paper proposes a new model Graph Enhanced Transformer model（GETNext）. The overall framework of the model is still Transformer. in addition , Build a global trajectory flow chart （global trajectory flow map） And use GCN To carry out POI Embedding. Then merge User Embedding、Category Embedding、Time Embedding（Time2Vec） As final input .

Main contributions ：

Came up with a Global trajectory flow graph （global trajectory flow map） To express POI Access sequence information , And use graph convolution network （GCN） Conduct POI The embedded .
This paper proposes a new time aware category embedding （time-aware category embedding）, To better represent time information .
A new method based on Transformer Framework , Move global mode （global transition patterns）、 User preferences （user general tastes）、 User's recent track （user short term trajectory） Integrate with spatiotemporal information , Conduct POI recommend .

GETNext

The model architecture is shown in the figure below ：

Trajectory Flow Map

The output of the trajectory diagram is mainly used for two parts ：

Using graph convolution networks GCN Conduct POI Embedding.
Use the attention module Attention Generate a transition probability matrix （transition attention map）

The paper is also right Trajectory Flow Map It's visualized , Several dense areas can be obviously found .

POI Embedding

Paper points out , Different users may share some similar track clips , At the same time, the same person can repeat a track many times . That is, use the collective information from other users to form a continuous track . for example , The two users in the figure below visited the same restaurant and cinema , And the access order is also the same .

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-kjY4t2Fj-1658547105921)(https://static.emoryhuang.cn/webp/4228286132-GETNext-2.webp)]

These trajectory flows can provide key information for users' general motion patterns , Help solve the problems of short tracks and inactive users .

（Trajectory Flow Map） Given a set of historical tracks $\mathcal{S}=\{S_u^i\}_{i\in \mathbb{N},u\in U}$ ,Trajectory Flow Map $\mathcal{G} = (V,E,\mathcal{l},\mathcal{w})$ Is a directed weighted graph , among ：
nodes aggregate $V = P$ , $P$ by POI aggregate ;
$p=\langle latitude,longitude,category,frequency \rangle\in P$ Longitude respectively 、 latitude 、 Category and access frequency ;
If continuous access $p_1,p_2$ , Then add an edge $p_1,p_2)$ ;
edge $p_1,p_2)$ The weight on is the frequency of this edge .

Just to summarize Trajectory Flow Map, This is a directed weighted graph , The points on the graph are each POI, Connect according to the user access track , The weight on the edge is the number of times the same clip track appears , node （POI） Record longitude on 、 latitude 、 Category and access frequency information .

Next, use graph convolution network GCN Formal POI Embeding. About GCN The specific principle of is not introduced here , If you are right about GCN Don't understand , You can see another article of mine ： Simple understanding of graph neural network GNN.

Calculate the Laplace matrix and give the update equation of the hidden layer ：

$\widetilde{\mathbf{L}}=(\mathbf{D}+\mathbf{I}_N)^{-1}(\mathbf{A}+\mathbf{I}_N)$

$\mathbf{H}^{(l)}=\sigma \left( \widetilde{\mathbf{L}}\mathbf{H}^{(l-1)}\mathbf{W}^{(l)}+b^{(l)} \right)$

among $\mathbf{D},\mathbf{A}$ Represent degree matrix and adjacency matrix respectively .

Personally, I feel there are some problems , In principle, it should be symmetrical normalization , That's what happened next ：
$\widetilde{\mathbf{L}}=(\mathbf{D}+\mathbf{I}_N)^{-1/2}(\mathbf{A}+\mathbf{I}_N)(\mathbf{D}+\mathbf{I}_N)^{-1/2}$

In each iteration ,GCN The layer updates the embedding of the node by aggregating the neighbor information of the node and the embedding of the node itself .

after $l^{*}$ After layer cycle , The output of the module can be expressed as ：

$\mathbf{e}_p=\widetilde{\mathbf{L}}\mathbf{H}^{(l^{*})}\mathbf{W}^{(l^{*}+1)}+b^{(l^{*}+1)} \in \mathbb{R}^{N\times \Omega}$

after GCN Then I got POI Vector representation of . It is worth noting that , Even if the current track is short ,POI Embedding still provides a wealth of information for prediction models .

Transition Attention Map

From the picture $\mathcal{G}$ What I learned POI Embedding Only general behavior models are captured , In order to further amplify the influence of collective signals , This paper presents the transition probability matrix $\mathbf{\Phi}$ To clarify from a POI To another POI The probability of transfer . say concretely ：

$\mathbf{\Phi}_1=(\mathbf{X} \times \mathbf{W}_1) \times \mathbf{a}_1 \in \mathbb{R}^{N\times 1}$

$\mathbf{\Phi}_2=(\mathbf{X} \times \mathbf{W}_2) \times \mathbf{a}_2 \in \mathbb{R}^{N\times 1}$

$\mathbf{\Phi}=(\mathbf{\Phi}_1 \times \mathbf{1}^T + \mathbf{1} \times \mathbf{\Phi}_2^T) \odot (\widetilde{\mathbf{L}}+J_N) \in \mathbb{R}^{N\times N}$

among $\mathbf{X}$ Is the information contained in the nodes in the graph （ longitude 、 latitude 、 Category and access frequency ）; $\mathbf{W}_1,\mathbf{W}_2,\mathbf{a}_1,\mathbf{a}_2$ For trainable parameters .

This formula is not particularly understood .

Contextual Embedding Module

POI Embedding outside , The paper also introduces the characteristics of space-time and user preferences .

POI-User Embeddings Fusion

In the paper , take User Embedding and POI Embedding Connect , To express check-in Activities .

$\mathbf{e}_u=f_{embed}(u)\in \mathbb{R}^{\Omega}$

$\mathbf{e}_{p,u}=\sigma(\mathbf{w}_{p,u}[\mathbf{e}_p;\mathbf{e}_u]+b_{p,u})\in \mathbb{R}^{\Omega \times 2}$

among $\mathbf{e}_u,\mathbf{e}_p$ respectively User Embedding and POI Embedding.

Time-Category Embeddings Fusion

in the light of Time Embedding, This paper uses Time2Vec Method , If you are right about Time2Vec Don't understand , You can see another article of mine ：. Special , The day is divided into 48 Time slice , Every time slice 30 minute , The length is $k + 1$ , say concretely ：

$\mathbf{e}_t[i]=\begin{cases} \omega_it+\varphi_i, &\text{if } i=0 \\ \sin(\omega_it+\varphi_i) &\text{if } 1\leq i\leq k \end{cases}$

On the other hand , Due to the sparsity of data and noise , The paper will Category Embedding and Time Embedding Splicing , Explore POI Time pattern of categories , Not a single one POI.

$\mathbf{e}_c=f_{embed}(c)\in \mathbb{R}^{\Psi}$

$\mathbf{e}_{c,t}=\sigma(\mathbf{w}_{c,t}[\mathbf{e}_t;\mathbf{e}_c]+b_{c,t})\in \mathbb{R}^{\Psi \times 2}$

After the above series of processing , We will check-in $q=\langle p,u,t \rangle$ Into vectors $\mathbf{e}_q=[\mathbf{e}_{p,u};\mathbf{e}_{c,t}]$ As Transformer The input of .

Transformer Encoder and MLP Decoders

Transformer Encoder

The backbone network still uses Transformer, I won't introduce too much here . For an input sequence $S_u=(q_u^1,q_u^2,\cdots,q_u^k)$ , We need to predict the next activity $q_u^{k+1}$ . Through the top check-in Embedding after , about $q_u^i$ You can get $\mathcal{X}^{[0]}\in \mathbb{R}^{k \times d}$ As Transformer The input of , among $d$ by embedding dimension .

Then there are some familiar Attention operation ：

$S=\mathcal{X}^{[l]}\mathbf{W}_q(\mathcal{X}^{[l]}\mathbf{W}_k)^T\in \mathbb{R}^{k\times k}$

$S_{i,j}'=\frac{\exp(S_{i,j})}{\sum_{j=1}^d\exp(S_{i,j})}$

$\text{head}_1=S'\mathcal{X}^{[l]}\mathbf{W}_v\in \mathbb{R}^{k\times d/h}$

$\text{Multihead}(\mathcal{X}^{[l]})=[\text{head}_1;\cdots;\text{head}_h]\times \mathbf{W}_o\in\mathbb{R}^{k\times d}$

Then there is the residual connection 、LayerNorm、FFN：

$\mathcal{X}_{\text{attn}}^{[l]}=\text{LayerNorm}\left(\mathcal{X}^{[l]}+\text{Multihead}(\mathcal{X}^{[l]}) \right)$

$\mathcal{X}_{FC}^{[l]}=\text{ReLU}(\mathbf{W}_1\mathcal{X}_{\text{attn}}^{[l]}+b_1)\mathbf{W}_2+b_2\in\mathbb{R}^{k\times d}$

$\mathcal{X}^{[l+1]}=\text{LayerNorm}(\mathcal{X}_{\text{attn}}^{[l]}+\mathcal{X}_{FC}^{[l]})\in\mathbb{R}^{k\times d}$

MLP Decoders

adopt Transformer Encoder Then we get the output $\mathcal{X}^{[l^*]}$ , After that, the output is mapped to three... By multi-layer perceptron MLP heads：

$\hat{\mathbf{Y}}_{\text{poi}}=\mathcal{X}^{[l^*]}\mathbf{W}_{\text{poi}}+b_{\text{poi}}$

$\hat{\mathbf{Y}}_{\text{time}}=\mathcal{X}^{[l^*]}\mathbf{W}_{\text{time}}+b_{\text{time}}$

$\hat{\mathbf{Y}}_{\text{cat}}=\mathcal{X}^{[l^*]}\mathbf{W}_{\text{cat}}+b_{\text{cat}}$

among , $\mathbf{W}_{\text{poi}}\in\mathbb{R}^{d\times N},\mathbf{W}_{\text{time}}\in\mathbb{R}^{d\times 1},\mathbf{W}_{\text{cat}}\in\mathbb{R}^{d\times \Gamma}$ They are learnable weights .

Special , about $\hat{\mathbf{Y}}_{\text{poi}}$ At the same time, the probability transfer matrix obtained above （Transition Attention Map） Combined with it ：

$\hat{\mathbf{y}}_{\text{poi}}=\hat{\mathbf{Y}}_{\text{poi}}^{(k\cdot)}+\Phi^{p_k\cdot}\in\mathbb{R}^{1\times N}$

The paper believes that twice check-in The time interval between them may fluctuate greatly , Corresponding POI Categories may also be different , Users should be in the next 1 Hours and 5 Hours to receive different suggestions . So in predicting the next POI At the same time, it also predicts the next check-in Time 、 type . That is, three are used in the paper MLP heads Why .

Loss

Because three prediction results are calculated in the paper , Therefore, the final loss is the weighted sum ：

$\mathcal{L}_{\text{final}}=\mathcal{L}_{\text{poi}}+10\times \mathcal{L}_{\text{time}}+\mathcal{L}_{\text{cat}}$

among , $\mathcal{L}_{\text{poi}}, \mathcal{L}_{\text{cat}}$ Use the cross entropy loss function to calculate , $\mathcal{L}_{\text{time}}$ Use the root mean square loss function （MSE） Calculation . Because time after standardization $\in[0,1]$ , Therefore, the final calculation of the loss is multiplied by 10 times .

experiment

Data sets ：

FourSquare：NYC,TKY
Gowalla：CA

The evaluation index ：

$\text{Acc}@k=\frac1m\sum_{i=1}^m\mathbb{1}(rank \leq k)$

$\text{MRR}=\frac1m\sum_{i=1}^m\frac{1}{rank}$

Results

Inactive users and active users

The paper is based on the activity of users , According to the user check-in Sort by quantity , Analyze the impact of users with different levels of activity on the model ：

Short trajectories and long trajectories

On the other hand , At the same time, the paper carries out experiments on the challenge under short trajectory ：

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-M9FGg3pl-1658547105922)(https://static.emoryhuang.cn/webp/4228286132-GETNext-6.webp)]

Here's how to remove trajectory flow map The experimental results of ：

Ablation Experiment

summary

Finally, make a conclusion , The backbone network of this paper is still Transformer, The biggest change is through building POI Transfer weight graph between （trajectory flow map） And pass GCN Conduct POI Embedding; Last , At the same time, predict POI、 Time 、 Category , Strengthen the loss function .