当前位置:网站首页>Machine learning - principal component analysis (PCA)

Machine learning - principal component analysis (PCA)

2022-06-24 10:10:00 Cpsu

# Start by creating a random dataset with dependencies 
import numpy as np
from matplotlib import pyplot as plt
from numpy import linalg

np.random.seed(2)
# Construct data set 
x1=[i for i in np.arange(1,10,0.1)]
x2=[np.random.uniform(2,4)*i+np.random.randn() for i in x1]
plt.scatter(x1,x2)

#zeros Create one that conforms to shape The random matrix of , It's not all 0 matrix , It could also be random numbers 
# Transform the data set into matrix form 
x=np.zeros((90,2))
x[:,0]=np.array(x1)
x[:,1]=np.array(x2)
x.shape

#(90, 2)


 Insert picture description here
The first step is centralization

#axis This parameter is used to select whether to calculate the average value in row direction or column direction 
data_array=x
mean_array=np.mean(data_array,axis=0)
center_array=data_array-mean_array


# Or use subtract
center_array=np.subtract(data_array,np.mean(data_array,axis=0) )

The second step is to calculate the covariance matrix and eigenvalue 、 Eigenvector

#rowvar The parameter is to select whether the behavior is a sample or listed as a sample 
cov_array=np.cov(center_array,rowvar=False)

eig_vals, eig_vects = linalg.eig(cov_array)

""" # The eigenvalue  (array([ 1.23589914, 80.95385223]), # Eigenvector  array([[-0.96430755, -0.26478471], [ 0.26478471, -0.96430755]]))  Wherein, characteristic value  1.23589914 The corresponding eigenvector is array([-0.96430755,0.26478471]) """
# Here should be selected before K The largest eigenvalue is the principal component 
# It is convenient to understand the algorithm. All eigenvalues are selected here 
# Get the index of characteristic value sorting 
val_index=np.argsort(eig_vals)
# The reverse 
val_index=val_index[::-1]
# Select the corresponding eigenvector 
eig_vect=eig_vects [:,val_index]
# Here we choose the first principal component matrix 
np.dot(center_array, eig_vect)[:,0]

 The first principal component matrix
call sklearn Module for verification

from sklearn.decomposition import PCA
data_mat = x
pca = PCA(n_components=1)
pca.fit(data_mat)
x_p=pca.fit(data_mat).transform(data_mat)
x_p

# The results are consistent 
原网站

版权声明
本文为[Cpsu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206240912594285.html