当前位置：网站首页>Paper notes: multi label learning dm2l

Paper notes: multi label learning dm2l

2022-06-24 08:21:00 【Minfan】

Abstract : Share your understanding of the paper . See the original Ma, Z.-C., & Chen, S.-C. (2021). Expand globally, shrink locally: Discrimi-nant multi-label learning with missing labels. Pattern Recognition, 111, 107675.

1. Contribution of thesis

Optimize both globally and locally ;
Support nonlinear transformation with kernel function ;
Theoretical analysis in place .

2. Basic symbols

Symbol	meaning	explain
$\mathbf{X} \in \mathbb{R}^{n \times d}$	Attribute matrix
$\mathbf{X}_k \in \mathbb{R}^{n_k \times d}$	With $k$ Attribute submatrix of labels
$\mathbf{Y} \in \{-1, +1\}^{n \times c}$	Label matrix
$\tilde{\mathbf{Y}} \in \{-1, +1\}^{n \times l}$	Observed label matrix
$\mathbf{\Omega} = \{1, \dots, n\} \times \{1, \dots, c\}$	Observation tag location set
$\mathbf{W} \in \mathbb{R}^{m \times l}$	coefficient matrix	It's still a linear model
$\mathbf{w}_i \in \mathbb{R}^m$	The coefficient vector of a label
$\mathbf{C} \in \mathbb{R}^{l \times l}$	Label correlation matrix	Pairwise correlation , Does not satisfy symmetry

3. Algorithm

Basic optimization objectives :
$\min \frac{1}{2} \|R_{\Omega}(\mathbf{XW}) - \tilde{\mathbf{Y}}\|_F^2 + \lambda_d \|\mathbf{XW}\|_*\tag{1}$
among ,

The loss function section does not consider missing values , This is a normal operation .
Kernel regularity (nuclear norm) The prediction matrix is partially considered , Not just $\mathbf{XW}$ , It's a little strange. .

The optimization objective after considering the label structure :
$\min \frac{1}{2} \|R_{\Omega}(\mathbf{XW}) - \tilde{\mathbf{Y}}\|_F^2 + \lambda_d \left(\sum_{k = 1}^c \|\mathbf{X}_k\mathbf{W}\|_* - \|\mathbf{XW}\|_*\right), \tag{2}$
among ,

$\|\mathbf{X}_k\mathbf{W}\|_*$ It expresses the local label structure , The smaller the better ;
$\|\mathbf{XW}\|_*$ It expresses the global label structure , The bigger the better ( More separable , The higher the amount of information ).
These two points are the source of the topic .

Add nonlinear optimization objective :
$\min \frac{1}{2} \|R_{\Omega}(\mathbf{XW}) - \tilde{\mathbf{Y}}\|_F^2 + \lambda_d \left(\sum_{k = 1}^c \|f(\mathbf{X}_k)\mathbf{W}\|_* - \|f(\mathbf{X})\mathbf{W}\|_*\right), \tag{5}$
among $f(\cdot)$ Nonlinear transformation caused by kernel function .