当前位置:网站首页>Calculation steps of principal component analysis
Calculation steps of principal component analysis
2022-07-24 06:09:00 【A little cute C】
Recently, I read many videos and blogs about principal component analysis , Most of them are talking about the derivation process , It's a little obscure , And will not apply . So I decided to see how to apply and calculate it first
Here, principal component analysis is used for feature selection , The data used is compiled by teacher Wang Hongzhi 《 Theory and practice of big data analysis 》 A Book .
Principal component analysis is mostly used in those features that are related to class labels , But there is noise and redundancy in these features . Use PCA The number of features can be reduced , Reduce noise and redundancy , Reduce the possibility of over fitting .
What is principal component analysis
Principal component analysis adopts the method of data dimensionality reduction , Find out several comprehensive variables to replace the original variables , And it should represent the information of the original variable as much as possible , These comprehensive variables should be independent of each other .
The idea of principal component analysis
take n Dimension features map to k D on (k<n), this k The dimension is a new orthogonal feature . Got k The dimensional feature is called the principal element . Be careful : this k Dimensional features are reconstructed , Not simply from n Remove the rest from the dimensional feature n-k Whitman's sign .
Dimensionality reduction by principal component analysis
Here is a set of data ( Here x and y It can be seen as two characteristics )
| x | y |
18.8 | 5.3 |
| 65.8 | 4.7 |
| 3.6 | 2.0 |
| 14.5 | 5.1 |
| 52.5 | 4.8 |
| 41.2 | 8.8 |
| 77.9 | 6.8 |
| 11.7 | 3.4 |
| 2.0 | 27.5 |
First step , O, respectively, x and y Average value , Then subtract the corresponding average value from all samples , here
, Subtract the mean value from each sample to get
| x | y |
| -13.2 | -2.3 |
| -33.8 | -2.9 |
| -28.4 | -5.6 |
| 17.5 | -2.5 |
| 20.5 | -2.8 |
| 9.2 | 1.2 |
| 45.9 | -0.8 |
| -20.3 | -4.2 |
| -30 | 19.9 |
The second step , Find the characteristic covariance matrix , If the data is 3 Dimensional , So the covariance matrix is

In this case, only x and y, Therefore, the covariance matrix is a two-dimensional matrix
Get 
The third step , Find the eigenvalue and eigenvector of covariance
The eigenvalue 
Eigenvector 
among 52.9003 and 783.6058 Are two eigenvalues . The eigenvalue 52.9003 The corresponding eigenvector is
. The eigenvalue 783.6058 The corresponding eigenvector is
. The eigenvectors here are normalized to unit vectors
Step four , Sort the characteristic values from small to large , Choose the largest k individual , And then I'm going to put the corresponding k An eigenvector matrix composed of eigenvectors as column vectors
In this example, there are only two eigenvalues , therefore , We only choose one eigenvalue , namely 783.6058, The corresponding eigenvector is 
Step five , The sample points are projected onto the selected eigenvectors ( The last step of matrix multiplication is to project the original sample points onto the axis corresponding to the eigenvector ).
Suppose the sample tree is m, Characteristics for n, The sample matrix minus the mean is zero DataAAdjust(m*n), The covariance matrix is m*n, selection k The matrix composed of the eigenvectors is EigenVectors(n*k), that Projected data FinalData by FinalData(m*k)=DataAdjust(m*n)x EigenVectors(n*k)
In this case FinalData(10*1)=DataAdjust(10*2)x EigenVectors(2*1)
And what you get is
| x |
| 11.0627 |
| -36.2729 |
| 23.2126 |
| -19.6830 |
| -22.9410 |
| -8.0747 |
| -46.4212 |
| 16.4139 |
| 47.8576 |
such , We will change the characteristics of the original data from 2 D to 1 dimension , this 1 Dimension is the original feature in 1 The projection onto the dimension . The 1 Dimensional features basically represent the original 2 Whitman's sign
边栏推荐
- Channel attention and spatial attention module
- Unity(三)三维数学和坐标系统
- Machine learning (zhouzhihua) Chapter 5 notes on neural network learning
- day2-WebSocket+排序
- Unity基础知识及一些基本API的使用
- String methods and instances
- Signals and systems: Hilbert transform
- DeepSort 总结
- JUC并发编程基础(1)--相关基础概念
- Accurate calculation of time delay detailed explanation of VxWorks timestamp
猜你喜欢

Channel attention and spatial attention module

JDBC初级学习 ------(师承尚硅谷)
![[activiti] group task](/img/f1/b99cae9e840d3a91d0d823655748fe.png)
[activiti] group task

【树莓派4B】七、远程登录树莓派的方法总结XShell,PuTTY,vncServer,Xrdp

JUC concurrent programming foundation (9) -- thread pool

C language linked list (create, traverse, release, find, delete, insert a node, sort, reverse order)

On the concepts of "input channel" and "output channel" in convolutional neural networks

Unicast, multicast, broadcast, tool development, introduction to QT UDP communication protocol development and source code of development tools

论文阅读-Endmember-Guided Unmixing Network (EGU-Net) 端元指导型高光谱解混网络

通道注意力与空间注意力模块
随机推荐
Raspberry pie is of great use. Use the campus network to build a campus local website
MySQL基础---约束
[MYCAT] MYCAT configuration file
头歌 平台作业
Pytorch single machine multi card distributed training
JUC并发编程基础(4)--线程组和线程优先级
AD1256
MySql与Qt连接、将数据输出到QT的窗口tableWidget详细过程。
Conversion of world coordinate system, camera coordinate system and image coordinate system
String methods and instances
JDBC初级学习 ------(师承尚硅谷)
tensorflow和pytorch框架的安装以及cuda踩坑记录
Qt新建工程简介
PDF文本合并
Draw contour cv2.findcontours function and parameter interpretation
js星星打分效果
Learning rate optimization strategy
Hit the wall record (continuously updated)
用指针访问二维数组
原生js放大镜效果