当前位置:网站首页>Machine learning 07: Interpretation of PCA and its sklearn source code
Machine learning 07: Interpretation of PCA and its sklearn source code
2022-06-26 05:46:00 【SP FA】
List of articles
summary
Principal Component Analysis That is, the main idea of principal component analysis is to n Dimension features map to k D on , this k Dimension is a new orthogonal feature, also known as the main component , It's in the original n Based on the characteristics of dimension, we reconstruct k Whitman's sign .PCA My job is to find a set of mutually orthogonal coordinate axes from the original space , among , The first new axis selection is the direction with the largest variance in the original data , The second new coordinate axis is selected in the plane orthogonal to the first coordinate axis to maximize the variance , The third axis is the same as 1,2 The one with the largest variance in the plane with orthogonal axes , By analogy , obtain n One of these axes . The new coordinate axis obtained in this way , Most of the variance is included in the front k In two axes , The variance of the following coordinate axis is almost 0 . therefore , We can keep only the former k dimension , So as to reduce the dimension of data features .
PCA There are two ways to calculate :
- Decompose covariance matrix : The specific implementation process mainly includes three steps :
- Calculate the covariance matrix of the data matrix
- Calculate the eigenvalue eigenvector of the covariance matrix
- Select the maximum eigenvalue ( That is, the variance is the largest ) Of k A matrix composed of eigenvectors corresponding to features
- Decompose the inner product matrix
Decompose covariance matrix
Covariance matrix
Covariance is used to describe the correlation between two random variables . Simply compare variance and covariance :
mean value : x ˉ = 1 n ∑ i = 1 N x ⃗ i \bar x=\frac1n\sum\limits^N_{i=1}\vec x_i xˉ=n1i=1∑Nxi
Sample variance : S 2 = 1 n − 1 ∑ i = 1 N ( x ˉ − x ⃗ i ) 2 S^2=\frac1{n-1}\sum\limits^N_{i=1}(\bar x-\vec x_i)^2 S2=n−11i=1∑N(xˉ−xi)2
sample X X X And the sample Y Y Y The covariance : C o v ( X , Y ) = 1 n − 1 ∑ i = 1 N ( x ˉ − x ⃗ i ) ( y ˉ − y ⃗ i ) Cov(X,Y)=\frac1{n-1}\sum\limits^N_{i=1}(\bar x-\vec x_i)(\bar y-\vec y_i) Cov(X,Y)=n−11i=1∑N(xˉ−xi)(yˉ−yi)
We can find out : The variance is calculated for one-dimensional characteristics , Covariance requires at least two dimensions . Variance is a special case of covariance . When the covariance is positive , explain X , Y X,Y X,Y It's a positive correlation , When it is negative , It means a negative correlation , by 0 It means that the two samples are independent of each other . When there are multiple samples , The covariance between two sets constitutes the covariance matrix .
Covariance matrix and divergence matrix are closely related , From the definition of divergence matrix : S = ∑ i = 1 N ( x ⃗ i − x ˉ ) ( x ⃗ i − x ˉ ) T S=\sum\limits^N_{i=1}(\vec x_i-\bar x)(\vec x_i-\bar x)^T S=i=1∑N(xi−xˉ)(xi−xˉ)T We found that , For a set of data X X X: S = ( n − 1 ) C o v ( X ) S=(n-1)Cov(X) S=(n−1)Cov(X), So they have the same eigenvalues and eigenvectors .
Eigenvalue decomposition and SVD
For the detailed process, please refer to another article of mine : Eigendecomposition and singular value decomposition
To calculate the n After eigenvectors , We sort them , And take the front k Just one .
Decompose the inner product matrix
stay sklearn
In the library PCA Is calculated using this method .
hypothesis X X X A matrix is a sample set , It has been centralized (PCA Data centralization is required ), So the covariance matrix is : 1 n X X T \frac1nXX^T n1XXT
We are right. X X X Conduct SVD decompose : X = U Σ V T X=U\Sigma V^T X=UΣVT, be X T = V Σ U T X^T=V\Sigma U^T XT=VΣUT
So there is : 1 n X X T = U Σ 2 U T \frac1nXX^T=U\Sigma^2U^T n1XXT=UΣ2UT
therefore , We can X X X Conduct SVD decompose , Then seek Σ \Sigma Σ To calculate the eigenvector .
About source code
Spent the night reading sklearn
Of Official code , Take a rough record . The source code is too long , So only the key parts are recorded .
Calculation PCA
adopt svd_solver
Parameters , We can choose different ways to solve the data , But the basic idea is the same , Here is just svd_solver == 'full'
That is, take the case of feature decomposition using all data as an example .
fit()
Method implementation :
among ,explained_variance_
Is the variance value ,explained_variance_ratio_
Contribution rate for each equation , That is, proportion , This parameter can be used to determine n Specific value .
Be careful ,n_components
Parameters can take several values :
- “mle”: Automatically determine n value .
- 0 < n_components < 1 0<\text{n\_components}<1 0<n_components<1: Indicates the proportion of information you want to keep .PCA Will automatically select to make The amount of information ≥ n_components \ge\text{n\_components} ≥n_components Number of features .
- n_components ≥ 1 \text{n\_components} \ge1 n_components≥1 And it's an integer : Before the election n individual
def _fit_full(self, X, n_components):
n_samples, n_features = X.shape
... # Validate the incoming data , Omitted for the time being
# Center data
self.mean_ = np.mean(X, axis=0)
X -= self.mean_
# Use the decomposition inner product matrix method to find the eigenvector
U, S, Vt = linalg.svd(X, full_matrices=False)
U, Vt = svd_flip(U, Vt) # Because sometimes it is possible to find negative values , So flip the eigenvector symbols to enforce deterministic output
components_ = Vt
explained_variance_ = (S**2) / (n_samples - 1)
total_var = explained_variance_.sum()
explained_variance_ratio_ = explained_variance_ / total_var
singular_values_ = S.copy() # Store the singular values.
# according to n_components Make sure to take several features , That is to say n Value .
if n_components == "mle":
n_components = _infer_dimension(explained_variance_, n_samples)
elif 0 < n_components < 1.0:
ratio_cumsum = stable_cumsum(explained_variance_ratio_)
n_components = np.searchsorted(ratio_cumsum, n_components, side="right") + 1
# Compute noise covariance using Probabilistic PCA model
# The sigma2 maximum likelihood (cf. eq. 12.46)
if n_components < min(n_features, n_samples):
self.noise_variance_ = explained_variance_[n_components:].mean()
else:
self.noise_variance_ = 0.0
... # Assign values to parameters , Omitted for the time being
return U, S, Vt
transform()
Method implementation :
The whitening operation is used here , That is, to decorrelate the data , To reduce the redundancy of input data , Whitened data has two characteristics : Eliminate the correlation between features , And the variances of all features are 1.
def transform(self, X):
... # Verify the program and data , Let's ignore
if self.mean_ is not None:
X = X - self.mean_
X_transformed = np.dot(X, self.components_.T)
# An albino
if self.whiten:
X_transformed /= np.sqrt(self.explained_variance_)
return X_transformed
fit_transform()
Method implementation :
The relevant formulas are carefully written in the notes by the author .
def fit_transform(self, X, y=None):
U, S, Vt = self._fit(X)
U = U[:, : self.n_components_]
if self.whiten:
# X_new = X * V / S * sqrt(n_samples) = U * sqrt(n_samples)
U *= sqrt(X.shape[0] - 1)
else:
# X_new = X * V = U * S * Vt * V = U * S
U *= S[: self.n_components_]
return U
Automatic selection n Value
Judge by calculating the log likelihood n How many to take , That is, keep several dimensions . Relevant papers can be seen here link
def _assess_dimension(spectrum, rank, n_samples):
"""Compute the log-likelihood of a rank ``rank`` dataset. The dataset is assumed to be embedded in gaussian noise of shape(n, dimf) having spectrum ``spectrum``. Parameters ---------- spectrum : ndarray of shape (n_features,) Data spectrum. rank : int Tested rank value. It should be strictly lower than n_features, otherwise the method isn't specified (division by zero in equation (31) from the paper). n_samples : int Number of samples. Returns ------- ll : float The log-likelihood. """
n_features = spectrum.shape[0]
if not 1 <= rank < n_features:
raise ValueError("the tested rank should be in [1, n_features - 1]")
eps = 1e-15
if spectrum[rank - 1] < eps:
# When the tested rank is associated with a small eigenvalue, there's
# no point in computing the log-likelihood: it's going to be very
# small and won't be the max anyway. Also, it can lead to numerical
# issues below when computing pa, in particular in log((spectrum[i] -
# spectrum[j]) because this will take the log of something very small.
return -np.inf
pu = -rank * log(2.0)
for i in range(1, rank + 1):
pu += (
gammaln((n_features - i + 1) / 2.0)
- log(np.pi) * (n_features - i + 1) / 2.0
)
pl = np.sum(np.log(spectrum[:rank]))
pl = -pl * n_samples / 2.0
v = max(eps, np.sum(spectrum[rank:]) / (n_features - rank))
pv = -np.log(v) * n_samples * (n_features - rank) / 2.0
m = n_features * rank - rank * (rank + 1.0) / 2.0
pp = log(2.0 * np.pi) * (m + rank) / 2.0
pa = 0.0
spectrum_ = spectrum.copy()
spectrum_[rank:n_features] = v
for i in range(rank):
for j in range(i + 1, len(spectrum)):
pa += log(
(spectrum[i] - spectrum[j]) * (1.0 / spectrum_[j] - 1.0 / spectrum_[i])
) + log(n_samples)
ll = pu + pl + pv + pp - pa / 2.0 - rank * log(n_samples) / 2.0
return ll
def _infer_dimension(spectrum, n_samples):
"""Infers the dimension of a dataset with a given spectrum. The returned value will be in [1, n_features - 1]. """
ll = np.empty_like(spectrum)
ll[0] = -np.inf # we don't want to return n_components = 0
for rank in range(1, spectrum.shape[0]):
ll[rank] = _assess_dimension(spectrum, rank, n_samples)
return ll.argmax()
Sample scores
The performance is reflected by calculating the log likelihood of each sample . Relevant papers can be seen here link
def get_covariance(self):
"""Compute data covariance with the generative model. ``cov = components_.T * S**2 * components_ + sigma2 * eye(n_features)`` where S**2 contains the explained variances, and sigma2 contains the noise variances. Returns ------- cov : array of shape=(n_features, n_features) Estimated covariance of data. """
components_ = self.components_
exp_var = self.explained_variance_
if self.whiten:
components_ = components_ * np.sqrt(exp_var[:, np.newaxis])
exp_var_diff = np.maximum(exp_var - self.noise_variance_, 0.0)
cov = np.dot(components_.T * exp_var_diff, components_)
cov.flat[:: len(cov) + 1] += self.noise_variance_ # modify diag inplace
return cov
def get_precision(self):
"""Compute data precision matrix with the generative model. Equals the inverse of the covariance but computed with the matrix inversion lemma for efficiency. Returns ------- precision : array, shape=(n_features, n_features) Estimated precision of data. """
n_features = self.components_.shape[1]
# handle corner cases first
if self.n_components_ == 0:
return np.eye(n_features) / self.noise_variance_
if np.isclose(self.noise_variance_, 0.0, atol=0.0):
return linalg.inv(self.get_covariance())
# Get precision using matrix inversion lemma
components_ = self.components_
exp_var = self.explained_variance_
if self.whiten:
components_ = components_ * np.sqrt(exp_var[:, np.newaxis])
exp_var_diff = np.maximum(exp_var - self.noise_variance_, 0.0)
precision = np.dot(components_, components_.T) / self.noise_variance_
precision.flat[:: len(precision) + 1] += 1.0 / exp_var_diff
precision = np.dot(components_.T, np.dot(linalg.inv(precision), components_))
precision /= -(self.noise_variance_**2)
precision.flat[:: len(precision) + 1] += 1.0 / self.noise_variance_
return precision
def score_samples(self, X):
"""Return the log-likelihood of each sample. Parameters ---------- X : array-like of shape (n_samples, n_features) The data. """
... # Verification of procedures and data , Let's ignore
Xr = X - self.mean_
n_features = X.shape[1]
precision = self.get_precision()
log_like = -0.5 * (Xr * (np.dot(Xr, precision))).sum(axis=1)
log_like -= 0.5 * (n_features * log(2.0 * np.pi) - fast_logdet(precision))
return log_like
def score(self, X, y=None):
"""Return the average log-likelihood of all samples."""
return np.mean(self.score_samples(X))
Sync update on :SP-FA The blog of
边栏推荐
猜你喜欢
随机推荐
Redis installation on Linux
Introduction to alluxio
Introduction to GUI programming to game practice (I)
Consul服务注册与发现
Lesson 4 serial port and clock
9 common classes
Leetcode114. 二叉树展开为链表
小程序如何关联微信小程序二维码,实现二码聚合
bingc(继承)
1212312321
一段不离不弃的爱情
使用Jedis监听Redis Stream 实现消息队列功能
Project suspension
数据存储:MySQL之InnoDB与MyISAM的区别
Summary of the 10th provincial Blue Bridge Cup
DOM document
冒泡排序(Bubble Sort)
LeetCode_ Binary search tree_ Simple_ 108. convert an ordered array to a binary search tree
BOM document
Life is so fragile