当前位置:网站首页>[fundamentals of machine learning 04] matrix factorization
[fundamentals of machine learning 04] matrix factorization
2022-06-22 06:54:00 【chad_ lee】
Completed the basic learning of machine learning , The author also shared the matrix decomposition based CTR Model for reference
Matrix decomposition advanced :FM、FFM
Matrix decomposition and deep learning :DeepFM、xDeepFM
Matrix decomposition and characteristic intersection :Wide & Deep、Deep & Cross Network
Matrix decomposition (Matrix Factorization)

For datasets D \mathcal D D , The square error based error measurement of the assumed function is :
E in ( { w m } , { v n } ) = 1 ∑ m = 1 M ∣ D m ∣ ∑ user n rated movie m ( r n m − w m T v n ) 2 E_{\text {in }}\left(\left\{\mathbf{w}_{m}\right\},\left\{\mathbf{v}_{n}\right\}\right)=\frac{1}{\sum_{m=1}^{M}\left|\mathcal{D}_{m}\right|} \sum_{\text {user } n \text { rated movie } m}\left(r_{n m}-\mathbf{w}_{m}^{T} \mathbf{v}_{n}\right)^{2} Ein ({ wm},{ vn})=∑m=1M∣Dm∣1user n rated movie m∑(rnm−wmTvn)2
So now it's time to use the data set D \mathcal D D Conduct v n \mathbf { v } _ { n } vn and w m \mathbf { w } _ { m } wm To ensure the minimum error :
min W , V E in ( { w m } , { v n } ) ∝ ∑ usern rated movie m ( r n m − w m T v n ) 2 = ∑ m = 1 M ( ∑ ( x n , r n m ) ∈ D m ( r n m − w m T v n ) 2 ) = ∑ n = 1 N ( ∑ ( x n , r n m ) ∈ D m ( r n m − v n T w m ) 2 ) \begin{aligned} \min _{\mathrm{W}, \mathrm{V}} E_{\text {in }}\left(\left\{\mathbf{w}_{m}\right\},\left\{\mathbf{v}_{n}\right\}\right) & \propto \sum_{\text {usern rated movie } m}\left(r_{n m}-\mathbf{w}_{m}^{T} \mathbf{v}_{n}\right)^{2} \\ &=\sum_{m=1}^{M}\left(\sum_{\left(\mathbf{x}_{n}, r_{n m}\right) \in \mathcal{D}_{m}}\left(r_{n m}-\mathbf{w}_{m}^{T} \mathbf{v}_{n}\right)^{2}\right) \\ &=\sum_{n=1}^{N}\left(\sum_{\left(\mathbf{x}_{n}, r_{n m}\right) \in \mathcal{D}_{m}}\left(r_{n m}-\mathbf{v}_{n}^{T} \mathbf{w}_{m}\right)^{2}\right) \end{aligned} W,VminEin ({ wm},{ vn})∝usern rated movie m∑(rnm−wmTvn)2=m=1∑M⎝⎛(xn,rnm)∈Dm∑(rnm−wmTvn)2⎠⎞=n=1∑N⎝⎛(xn,rnm)∈Dm∑(rnm−vnTwm)2⎠⎞
Because there is v n v_n vn and w m w_m wm Two variables , It may be difficult to optimize at the same time , So the basic idea is to use alternate minimization operations (alternating minimization):
- Fix v n \mathbf{v}_{n} vn, That is to say, fixed user eigenvectors , Then find each one w m ≡ minimize E in \mathbf{w}_{m} \equiv \operatorname{minimize} E_{\text {in }} wm≡minimizeEin within D m \mathcal{D}_{m} Dm.
- Fix w m \mathbf{w}_{m} wm, That is to say, the feature vector of the movie , Then find each one v n ≡ minimize E in \mathbf{v}_{n} \equiv \operatorname{minimize} E_{\text {in }} vn≡minimizeEin within D m . \mathcal{D}_{m \text { . }} Dm .
This process is called alternating least squares algorithm (alternating least squares algorithm). specific working means :
initialize d ~ \tilde{d} d~ dimension vectors { w m } , { v n } \left\{\mathbf{w}_{m}\right\},\left\{\mathbf{v}_{n}\right\} { wm},{ vn}
alternating optimization of E in : E_{\text {in }}: Ein :repeatedly
optimize w 1 , w 2 , … , w M : \mathbf{w}_{1}, \mathbf{w}_{2}, \ldots, \mathbf{w}_{M}: w1,w2,…,wM: update w m \mathbf{w}_{m} wm by m m m -th-movie linear regression on { ( v n , r n m ) } \left\{\left(\mathbf{v}_{n}, r_{n m}\right)\right\} { (vn,rnm)}
optimize v 1 , v 2 , … , v N : \mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{N}: v1,v2,…,vN: update v n \mathbf{v}_{n} vn by n n n -th-user linear regression on { ( w m , r n m ) } \left\{\left(\mathbf{w}_{m}, r_{n m}\right)\right\} { (wm,rnm)}
until converge
SGD for Matrix Factorization
Remember that our error function is :
E in ( { w m } , { v n } ) = 1 ∑ m = 1 M ∣ D m ∣ ∑ user n rated movie m ( r n m − w m T v n ) 2 E_{\text {in }}\left(\left\{\mathbf{w}_{m}\right\},\left\{\mathbf{v}_{n}\right\}\right)=\frac{1}{\sum_{m=1}^{M}\left|\mathcal{D}_{m}\right|} \sum_{\text {user } n \text { rated movie } m}\left(r_{n m}-\mathbf{w}_{m}^{T} \mathbf{v}_{n}\right)^{2} Ein ({ wm},{ vn})=∑m=1M∣Dm∣1user n rated movie m∑(rnm−wmTvn)2
Because only one sample is taken out for optimization each time , Then first observe the error measurement of a single sample :
err ( user n , movie m , rating r n m ) = ( r n m − w m T v n ) 2 \text { err }\left(\text { user } n, \text { movie } m, \text { rating } r_{n m}\right)=\left(r_{n m}-\mathbf{w}_{m}^{T} \mathbf{v}_{n}\right)^{2} err ( user n, movie m, rating rnm)=(rnm−wmTvn)2
So partial derivative :
∇ v n err ( user n , movie m , rating r n m ) = − 2 ( r n m − w m T v n ) w m ∇ w m err ( user n , movie m , rating r n m ) = − 2 ( r n m − w m T v n ) v n \nabla_{\mathbf{v}_{n}} \quad \operatorname{err}\left(\text { user } n, \text { movie } m, \text { rating } r_{n m}\right)=-2\left(r_{n m}-\mathbf{w}_{m}^{T} \mathbf{v}_{n}\right) \mathbf{w}_{m} \\ \nabla_{\mathbf{w}_{m}} \quad \operatorname{err}\left(\text { user } n, \text { movie } m, \text { rating } r_{n m}\right)=-2\left(r_{n m}-\mathbf{w}_{m}^{T} \mathbf{v}_{n}\right) \mathbf{v}_{n} ∇vnerr( user n, movie m, rating rnm)=−2(rnm−wmTvn)wm∇wmerr( user n, movie m, rating rnm)=−2(rnm−wmTvn)vn
That is to say, only for the current sample v n \mathbf{v}_{n} vn and w m \mathbf{w}_{m} wm Have an impact on , The partial derivatives of other parameters are all zero . In conclusion, it is :
per-example gradient ∝ − ( residual ) ( the other feature vector ) \text { per-example gradient } \propto-(\text { residual })(\text { the other feature vector }) per-example gradient ∝−( residual )( the other feature vector )
Then the practical steps of using the random gradient descent method to solve the matrix decomposition are :

Be careful. , If you recommend anything related to timing , For example, the test set is the data after the training set time , Then you can put SGD Medium “S” It's about time , That is, according to the time sequence pick (n, m).
边栏推荐
- 一个算子在深度学习框架中的旅程
- iframe框架,,原生js路由
- How to learn 32-bit MCU
- Vector of relevant knowledge of STL Standard Template Library
- Reprint the Alibaba open source project egg JS technical documents cause "copyright disputes". How to use the loose MIT license?
- Map of STL knowledge summary
- 6. install the SSH connection tool (used to connect the server of our lab)
- Anaconda introduction, installation and use nanny level tutorial
- Linux link sqlserver, offline installation
- 代码的巨大进步
猜你喜欢

Xh_CMS渗透测试文档

Introduction to 51 Single Chip Microcomputer -- the use of Keil uvision4
![[5g NR] mobile phone ID number IMEI and imeisv](/img/f0/2613fc9f59f7d0d4335b13f0273fc2.png)
[5g NR] mobile phone ID number IMEI and imeisv

KV260的PMOD接口介绍

【5G NR】NGAP协议之NG Setup

Flink core features and principles

Don't throw away the electric kettle. It's easy to fix!

Difference between grail layout and twin wing layout

Reprint the Alibaba open source project egg JS technical documents cause "copyright disputes". How to use the loose MIT license?

OpenGL - Draw Triangle
随机推荐
Linux link sqlserver, offline installation
【M32】单片机 svn 设置忽略文件
[rust notes] 01 basic types
SQL injection vulnerability (XII) cookie injection
Leetcode: interview question 08.12 Eight queens [DFS + backtrack]
Don't throw away the electric kettle. It's easy to fix!
(multithreaded: producer consumer mode) blocking queue
Why did I choose rust
代码的巨大进步
Armadillo安装
代理模式与装饰模式到底哪家强
Single cell paper record (part6) -- space: spatial gene enhancement using scrna seq
The song of cactus - marching into to C live broadcast (2)
Py之Optbinning:Optbinning的简介、安装、使用方法之详细攻略
ERROR: cannot verify nih. At's certificate, issued by "/c=us/o=let's encrypt/cn=r3", WGet reports an error
Introduction to 51 Single Chip Microcomputer -- digital clock
6. 安装ssh连接工具(用于我们连接实验室的服务器)
高考是人生旅途的一处驿站
Introduction to 51 Single Chip Microcomputer -- the use of Keil uvision4
OpenGL - Textures