当前位置：网站首页>Calculation steps of principal component analysis

Calculation steps of principal component analysis

2022-07-24 06:09:00 【A little cute C】

Recently, I read many videos and blogs about principal component analysis , Most of them are talking about the derivation process , It's a little obscure , And will not apply . So I decided to see how to apply and calculate it first

Here, principal component analysis is used for feature selection , The data used is compiled by teacher Wang Hongzhi 《 Theory and practice of big data analysis 》 A Book .

Principal component analysis is mostly used in those features that are related to class labels , But there is noise and redundancy in these features . Use PCA The number of features can be reduced , Reduce noise and redundancy , Reduce the possibility of over fitting .

What is principal component analysis

Principal component analysis adopts the method of data dimensionality reduction , Find out several comprehensive variables to replace the original variables , And it should represent the information of the original variable as much as possible , These comprehensive variables should be independent of each other .

The idea of principal component analysis

take n Dimension features map to k D on （k<n）, this k The dimension is a new orthogonal feature . Got k The dimensional feature is called the principal element . Be careful ： this k Dimensional features are reconstructed , Not simply from n Remove the rest from the dimensional feature n-k Whitman's sign .

Dimensionality reduction by principal component analysis

Here is a set of data （ Here x and y It can be seen as two characteristics ）

Data
x	y
18.8	5.3
65.8	4.7
3.6	2.0
14.5	5.1
52.5	4.8
41.2	8.8
77.9	6.8
11.7	3.4
2.0	27.5

First step , O, respectively, x and y Average value , Then subtract the corresponding average value from all samples , here $\bar{x}=32,\bar{y}=7.6$ , Subtract the mean value from each sample to get

DataAdjust
x	y
-13.2	-2.3
-33.8	-2.9
-28.4	-5.6
17.5	-2.5
20.5	-2.8
9.2	1.2
45.9	-0.8
-20.3	-4.2
-30	19.9

The second step , Find the characteristic covariance matrix , If the data is 3 Dimensional , So the covariance matrix is

$C=\begin{pmatrix} &cov(x,x) &cov(x,y) &cov(x,z) \\ &cov(y,x) &cov(y,y) &cov(y,z) \\ &cov(z,x) &cov(z,y) &cov(z,z) \end{pmatrix}$

In this case, only x and y, Therefore, the covariance matrix is a two-dimensional matrix

Get $cov=\begin{pmatrix} &777.1461 &-68.3987 \\ &-68.3987 &59.3600 \end{pmatrix}$

The third step , Find the eigenvalue and eigenvector of covariance

The eigenvalue $eigenvalues=\begin{pmatrix} &52.9003 \\ & 783.6058 \end{pmatrix}$

Eigenvector $eigenvectors=\begin{pmatrix} &-0.0940 &-0.9956 \\ & -0.9956 &0.9040 \end{pmatrix}$

among 52.9003 and 783.6058 Are two eigenvalues . The eigenvalue 52.9003 The corresponding eigenvector is $(-0.0940,-0.9956)^{T}$ . The eigenvalue 783.6058 The corresponding eigenvector is $(-0.9956,0.9040)^{T}$ . The eigenvectors here are normalized to unit vectors

Step four , Sort the characteristic values from small to large , Choose the largest k individual , And then I'm going to put the corresponding k An eigenvector matrix composed of eigenvectors as column vectors

In this example, there are only two eigenvalues , therefore , We only choose one eigenvalue , namely 783.6058, The corresponding eigenvector is $(-0.9956,0.9040)^{T}$

Step five , The sample points are projected onto the selected eigenvectors （ The last step of matrix multiplication is to project the original sample points onto the axis corresponding to the eigenvector ）.

Suppose the sample tree is m, Characteristics for n, The sample matrix minus the mean is zero DataAAdjust（m*n）, The covariance matrix is m*n, selection k The matrix composed of the eigenvectors is EigenVectors（n*k）, that Projected data FinalData by FinalData（m*k）=DataAdjust（m*n）x EigenVectors（n*k）

In this case FinalData（10*1）=DataAdjust（10*2）x EigenVectors（2*1）

And what you get is

Transformed Data （Single eigenvector）
x
11.0627
-36.2729
23.2126
-19.6830
-22.9410
-8.0747
-46.4212
16.4139
47.8576

such , We will change the characteristics of the original data from 2 D to 1 dimension , this 1 Dimension is the original feature in 1 The projection onto the dimension . The 1 Dimensional features basically represent the original 2 Whitman's sign

原网站

版权声明
本文为[A little cute C]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/205/202207240517049365.html

当前位置：网站首页>Calculation steps of principal component analysis

Calculation steps of principal component analysis

What is principal component analysis

The idea of principal component analysis

Dimensionality reduction by principal component analysis

边栏推荐

猜你喜欢

随机推荐