当前位置:网站首页>【sklearn】PCA
【sklearn】PCA
2022-07-24 07:34:00 【rejudge】
PCA
sklearn.decomposition.PCA
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
iris = load_iris()
X = iris.data
Y = iris.target
# Two dimensional array , Four dimensional characteristic matrix
X.shape
(150, 4)
PCA() Use
# Default dimension reduction to min(X.shape)
pca = PCA(n_components=2) # drop to 2 dimension
pca = pca.fit(X)
X_dr = pca.transform(X)
# X_dr = PCA(2).fit_transform(X)
X_dr.shape
(150, 2)
# Dimensionality reduction 2 Weihou , It is convenient to draw pictures to view the sample distribution
plt.figure()
plt.scatter(X_dr[Y==0, 0], X_dr[Y==0, 1], c='red', label=iris.target_names[0])
plt.scatter(X_dr[Y==1, 0], X_dr[Y==1, 1], c='black', label=iris.target_names[1])
plt.scatter(X_dr[Y==2, 0], X_dr[Y==2, 1], c='orange', label=iris.target_names[2])
plt.title('PCA of IRIS dataset')
plt.legend()
plt.show()
# Closer, more similar , It's perfect for KNN Equal clustering model

# Check the amount of information of the new eigenvector ( Interpretable variance )
print(pca.explained_variance_)
# View the percentage of the information of the new feature vector in the information of the original data ( Explanatory variance contribution rate )
print(pca.explained_variance_ratio_)
# The amount of information retained after dimensionality reduction
print(pca.explained_variance_ratio_.sum())
''' [4.22824171 0.24267075] [0.92461872 0.05306648] 0.9776852063187949 '''
Cumulative interpretable variance contribution rate curve
pca_line = PCA().fit(X)
print(pca_line.explained_variance_ratio_)
[0.92461872 0.05306648 0.01710261 0.00521218]
''' drop to 1 Dimension retention 0.92461872, drop to 2 Dimension retention 0.92461872+0.05306648 drop to 4 Dimension retention 1 array([0.92461872, 0.05306648, 0.01710261, 0.00521218]) '''
import numpy as np
# np.cumsum(pca_line.explained_variance_ratio_) # Add up
plt.plot([1,2,3,4], np.cumsum(pca_line.explained_variance_ratio_))
plt.xticks([1,2,3,4]) # Limit the axis to display integers
plt.xlabel('number of components after dimention reduction')
plt.ylabel('cumulative explained variance ratio')
plt.show()

n_components=‘mle’ Optional
# Maximum likelihood estimation (mle) Optional super parameter
# Large amount of computation
pca_mle = PCA(n_components='mle')
pca_mle = pca_mle.fit(X)
X_mle = pca_mle.transform(X)
print(pca_mle.explained_variance_ratio_)
[0.92461872 0.05306648 0.01710261]
# Visible drop 3 Best dimension
Select the super parameter according to the proportion of information SVD
''' Try , Select the dimension to drop It is stipulated that the information will be retained after dimensionality reduction 0.97, add to svd_solver='full' svd Singular value decomposition (solver solver ) auto Automatically select according to the amount of data full or randomized full Suitable for small amount of data randomized Fit feature matrix arpack ( commonly auto, It doesn't work out randomized) '''
pca_f = PCA(n_components=0.97, svd_solver='full')
pca_f = pca_f.fit(X)
X_f = pca_f.transform(X)
print(pca_f.explained_variance_ratio_)
[0.92461872 0.05306648]
边栏推荐
- Jay Chou's live broadcast was watched by more than 6.54 million people, with a total interaction volume of 450million, helping Kwai break the record again
- 觉维设计响应式布局
- Advanced part of Nacos
- Fopen, fwrite, fseek, fTell, FREAD use demo
- R语言手写数字识别
- 25. Message subscription and publishing - PubSub JS
- Cloud version upgrade
- Filter filter
- From the perspective of CIA, common network attacks (blasting, PE, traffic attacks)
- JS_实现多行文本根据换行分隔成数组
猜你喜欢
![[FreeRTOS] 11 software timer](/img/d8/a367c26b51d9dbaf53bf4fe2a13917.png)
[FreeRTOS] 11 software timer

【HiFlow】腾讯云HiFlow场景连接器实现校园信息管理智能化
![[hiflow] Tencent cloud hiflow scene connector realizes intelligent campus information management](/img/a9/7cdab9264902b1e2947a43463f6b32.png)
[hiflow] Tencent cloud hiflow scene connector realizes intelligent campus information management

JS_ Realize the separation of multiple lines of text into an array according to the newline

Li Kou, niuke.com - > linked list related topics (Article 1) (C language)

Feature Selective Anchor-Free Module for Single-Shot Object Detection

R语言手写数字识别

requests-爬虫多页爬取肯德基餐厅位置

全国职业院校技能大赛网络安全B模块 Windows操作系统渗透测试

Compilation and debugging (GCC, g++, GDB)
随机推荐
FlinkSQL-UDF自定义数据源
Paper reading: hardnet: a low memory traffic network
Simple installation of sqli Labs
游戏三子棋
Compilation and debugging (GCC, g++, GDB)
Give a string ① please count the number of times each letter appears ② please print the pair with the most letters
Service Vulnerability & FTP & RDP & SSH & Rsync
numpy.inf
Advanced part of C language VI. file operation
mysql查询当前节点的所有父级
Network security B module windows operating system penetration test of national vocational college skills competition
The goal you specified requires a project to execute but there is no POM in this directory
Introduction to C language II. Functions
Laplace distribution
There are two tables in Oracle, a and B. these two tables need to be associated with the third table C. how to update the field MJ1 in table a to the value MJ2 in table B
R language handwritten numeral recognition
stdafx.h 简介及作用
Unable to auto assemble, bean of type "redistemplate" not found
Deep analysis of data storage in memory
Buffer overflow vulnerability of network security module B in national vocational college skills competition