当前位置:网站首页>[RS sampling] a gain tuning dynamic negative sampler for recommendation (WWW 2022)
[RS sampling] a gain tuning dynamic negative sampler for recommendation (WWW 2022)
2022-07-25 12:00:00 【chad_ lee】
《Simplify and Robustify Negative Sampling》 NIPS 2020
This article experimentally observed that although False Negative and Hard Negative There will be larger Socre, however False Negative There is a lower prediction variance . So I propose a Simplify and Robustify Negative Sampling Method , In the training epoch t t t when , According to the former 5 individual epoch My training record , High prediction score 、 The sample with large variance is taken as Hard Negative:

A Gain-Tuning Dynamic Negative Sampler for Recommendation (WWW 2022)
Existing excavation RS The method of hard negative samples only wants to mine samples with large gradient contribution in the training process ( There is a big gap between prediction and label ), stay RS In this scenario, it is easy to choose False negative sample (False Negative、missing data), This leads to over fitting the training data set .
This paper presents a sampler based on expected gain , In the training process, according to the expected change of the gap between positive and negative samples , Dynamically direct negative sampling , False negative samples can be identified .

Gain aware negative sampler
Measure an object j j j Is it the user u u u The method of true negative samples :
H t ( u , j ) = E i ∼ Δ u σ ( r u , j − r u , i ) \mathcal{H}^{t}(u, j)=\mathbb{E}_{i \sim \Delta_{u}} \sigma\left(r_{u, j}-r_{u, i}\right) Ht(u,j)=Ei∼Δuσ(ru,j−ru,i)
The formula calculates the expectation , t t t It's training epoch, Δ u \Delta_{u} Δu A collection of items that users have interacted with , σ \sigma σ yes sigmoid function , In parentheses is the score of the negative sample minus the score of the positive sample .
The negative sample selected in this way is close to the positive sample , It can provide a relatively large gradient for the training process , To provide more information . The ideal is very good , But experiments have found that such hard negative samples are really few , Instead, it is likely to choose pseudo negative samples . The experiment also found that , True negative samples H t ( u , j ) \mathcal{H}^{t}(u, j) Ht(u,j) The degree of change is greater than that of pseudo negative samples , Therefore, a measurement method of gain perception is further proposed , Monitor samples with large changes :
G u , j t = α ⋅ G u , j t − 1 + ( 1 − α ) ⋅ σ ( H u , j t − 1 − H u , j t H u , j t + ϵ ) \mathcal{G}_{u, j}^{t}=\alpha \cdot \mathcal{G}_{u, j}^{t-1}+(1-\alpha) \cdot \sigma\left(\frac{\mathcal{H}_{u, j}^{t-1}-\mathcal{H}_{u, j}^{t}}{\mathcal{H}_{u, j}^{t}+\epsilon}\right) Gu,jt=α⋅Gu,jt−1+(1−α)⋅σ(Hu,jt+ϵHu,jt−1−Hu,jt)
This indicator measures H t ( u , j ) \mathcal{H}^{t}(u, j) Ht(u,j) The degree of decline , The author thinks that two epoch The expected gain in the middle is the signal that is more sensitive to detect the difference between negative samples and positive samples . among α \alpha α Is the smoothing coefficient , ϵ \epsilon ϵ Is to prevent the denominator from being 0.
This indicator can be understood as , In the last epoch in , Which sample H t ( u , j ) \mathcal{H}^{t}(u, j) Ht(u,j) The decline is the most , Choose who is the negative sample .
Grouping optimizer
Proposed a similar MCL、CPR Of loss
L ( u , Δ u , Δ u ′ ) = ∑ i ∈ Δ u ∑ j ∈ Δ u ′ ∣ r u , j − r u , i + γ ∣ + \mathcal{L}\left(u, \Delta_{u}, \Delta_{u}^{\prime}\right)=\sum_{i \in \Delta_{u}} \sum_{j \in \Delta_{u}^{\prime}}\left|r_{u, j}-r_{u, i}+\gamma\right|_{+} L(u,Δu,Δu′)=i∈Δu∑j∈Δu′∑∣ru,j−ru,i+γ∣+
Δ u , Δ u ′ \Delta_{u}, \Delta_{u}^{\prime} Δu,Δu′ They are users u u u Positive sample set and negative sample set , It means that each positive sample should be calculated separately for all negative samples loss, Equal to all positive samples share negative sample information , Instead of one-on-one optimization , More efficient , More information . and CPR and MCL It means very much .
experimental result
base The model is GMF: r u , i = W ⊤ ( P u ⊙ Q i ) = ∑ k = 1 d w k ⋅ p u , k ⋅ q i , k r_{u, i}=W^{\top}\left(P_{u} \odot Q_{i}\right)=\sum_{k=1}^{d} w_{k} \cdot p_{u, k} \cdot q_{i, k} ru,i=W⊤(Pu⊙Qi)=∑k=1dwk⋅pu,k⋅qi,k
Performance gains mainly come from grouping loss

The core of the article idea Mainly from this experimental diagram :

Analyze real and false negative samples H and G The distribution of , It can be seen that in the process of training H Higher and higher are false negative samples , True negative samples G Higher and higher .
边栏推荐
- dirReader.readEntries 兼容性问题 。异常错误DOMException
- JS data types and mutual conversion
- 油猴脚本链接
- 奉劝那些刚参加工作的学弟学妹们:要想进大厂,这些并发编程知识是你必须要掌握的!完整学习路线!!(建议收藏)
- PHP curl post length required error setting header header
- Javescript loop
- 软件测试阶段的风险
- 程序员送给女孩子的精美礼物,H5立方体,唯美,精致,高清
- Onenet platform control w5500 development board LED light
- W5500 multi node connection
猜你喜欢

任何时间,任何地点,超级侦探,认真办案!

Brpc source code analysis (I) -- the main process of RPC service addition and server startup

【USB设备设计】--复合设备,双HID高速(64Byte 和 1024Byte)

JaveScript循环

【GCN-CTR】DC-GNN: Decoupled GNN for Improving and Accelerating Large-Scale E-commerce Retrieval WWW22

The first C language program (starting from Hello World)

油猴脚本链接

JS流程控制

Learning to Pre-train Graph Neural Networks(图预训练与微调差异)

PHP curl post x-www-form-urlencoded
随机推荐
Differences in usage between tostring() and new string()
[USB device design] - composite device, dual hid high-speed (64BYTE and 1024byte)
GPT plus money (OpenAI CLIP,DALL-E)
PHP curl post length required error setting header header
Teach you how to configure S2E as the working mode of TCP client through MCU
【图攻防】《Backdoor Attacks to Graph Neural Networks 》(SACMAT‘21)
Application and innovation of low code technology in logistics management
brpc源码解析(三)—— 请求其他服务器以及往socket写数据的机制
【多模态】《TransRec: Learning Transferable Recommendation from Mixture-of-Modality Feedback》 Arxiv‘22
JS scope and pre parsing
对比学习的应用(LCGNN,VideoMoCo,GraphCL,XMC-GAN)
brpc源码解析(四)—— Bthread机制
Brpc source code analysis (VI) -- detailed explanation of basic socket
[MySQL learning 09]
小程序image 无法显示base64 图片 解决办法 有效
php curl post Length Required 错误设置header头
【6篇文章串讲ScalableGNN】围绕WWW 2022 best paper《PaSca》
Brpc source code analysis (VII) -- worker bthread scheduling based on parkinglot
dirReader. Readentries compatibility issues. Exception error domexception
Classification parameter stack of JS common built-in object data types