当前位置:网站首页>Paper notes: multi label learning dm2l

Paper notes: multi label learning dm2l

2022-06-24 08:21:00 Minfan

Abstract : Share your understanding of the paper . See the original Ma, Z.-C., & Chen, S.-C. (2021). Expand globally, shrink locally: Discrimi-nant multi-label learning with missing labels. Pattern Recognition, 111, 107675.

1. Contribution of thesis

  • Optimize both globally and locally ;
  • Support nonlinear transformation with kernel function ;
  • Theoretical analysis in place .

2. Basic symbols

Symbol meaning explain
X ∈ R n × d \mathbf{X} \in \mathbb{R}^{n \times d} XRn×d Attribute matrix
X k ∈ R n k × d \mathbf{X}_k \in \mathbb{R}^{n_k \times d} XkRnk×d With k k k Attribute submatrix of labels
Y ∈ { − 1 , + 1 } n × c \mathbf{Y} \in \{-1, +1\}^{n \times c} Y{ 1,+1}n×c Label matrix
Y ~ ∈ { − 1 , + 1 } n × l \tilde{\mathbf{Y}} \in \{-1, +1\}^{n \times l} Y~{ 1,+1}n×l Observed label matrix
Ω = { 1 , … , n } × { 1 , … , c } \mathbf{\Omega} = \{1, \dots, n\} \times \{1, \dots, c\} Ω={ 1,,n}×{ 1,,c} Observation tag location set
W ∈ R m × l \mathbf{W} \in \mathbb{R}^{m \times l} WRm×l coefficient matrix It's still a linear model
w i ∈ R m \mathbf{w}_i \in \mathbb{R}^m wiRm The coefficient vector of a label
C ∈ R l × l \mathbf{C} \in \mathbb{R}^{l \times l} CRl×l Label correlation matrix Pairwise correlation , Does not satisfy symmetry

3. Algorithm

Basic optimization objectives :
min ⁡ 1 2 ∥ R Ω ( X W ) − Y ~ ∥ F 2 + λ d ∥ X W ∥ ∗ (1) \min \frac{1}{2} \|R_{\Omega}(\mathbf{XW}) - \tilde{\mathbf{Y}}\|_F^2 + \lambda_d \|\mathbf{XW}\|_*\tag{1} min21RΩ(XW)Y~F2+λdXW(1)
among ,

  • The loss function section does not consider missing values , This is a normal operation .
  • Kernel regularity (nuclear norm) The prediction matrix is partially considered , Not just X W \mathbf{XW} XW, It's a little strange. .

The optimization objective after considering the label structure :
min ⁡ 1 2 ∥ R Ω ( X W ) − Y ~ ∥ F 2 + λ d ( ∑ k = 1 c ∥ X k W ∥ ∗ − ∥ X W ∥ ∗ ) , (2) \min \frac{1}{2} \|R_{\Omega}(\mathbf{XW}) - \tilde{\mathbf{Y}}\|_F^2 + \lambda_d \left(\sum_{k = 1}^c \|\mathbf{X}_k\mathbf{W}\|_* - \|\mathbf{XW}\|_*\right), \tag{2} min21RΩ(XW)Y~F2+λd(k=1cXkWXW),(2)
among ,

  • ∥ X k W ∥ ∗ \|\mathbf{X}_k\mathbf{W}\|_* XkW It expresses the local label structure , The smaller the better ;
  • ∥ X W ∥ ∗ \|\mathbf{XW}\|_* XW It expresses the global label structure , The bigger the better ( More separable , The higher the amount of information ).
  • These two points are the source of the topic .

Add nonlinear optimization objective :
min ⁡ 1 2 ∥ R Ω ( X W ) − Y ~ ∥ F 2 + λ d ( ∑ k = 1 c ∥ f ( X k ) W ∥ ∗ − ∥ f ( X ) W ∥ ∗ ) , (5) \min \frac{1}{2} \|R_{\Omega}(\mathbf{XW}) - \tilde{\mathbf{Y}}\|_F^2 + \lambda_d \left(\sum_{k = 1}^c \|f(\mathbf{X}_k)\mathbf{W}\|_* - \|f(\mathbf{X})\mathbf{W}\|_*\right), \tag{5} min21RΩ(XW)Y~F2+λd(k=1cf(Xk)Wf(X)W),(5)
among f ( ⋅ ) f(\cdot) f() Nonlinear transformation caused by kernel function .

4. Summary

  • Another pile of theoretical proofs .
原网站

版权声明
本文为[Minfan]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206240444230088.html