当前位置:网站首页>【sklearn】PCA
【sklearn】PCA
2022-07-24 05:15:00 【rejudge】
sklearn.decomposition.PCA
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
iris = load_iris()
X = iris.data
Y = iris.target
# 二维数组,四维特征矩阵
X.shape
(150, 4)
PCA()使用
# 默认降维到min(X.shape)
pca = PCA(n_components=2) # 降到2维
pca = pca.fit(X)
X_dr = pca.transform(X)
# X_dr = PCA(2).fit_transform(X)
X_dr.shape
(150, 2)
# 降维到2维后,方便画图查看样本分布
plt.figure()
plt.scatter(X_dr[Y==0, 0], X_dr[Y==0, 1], c='red', label=iris.target_names[0])
plt.scatter(X_dr[Y==1, 0], X_dr[Y==1, 1], c='black', label=iris.target_names[1])
plt.scatter(X_dr[Y==2, 0], X_dr[Y==2, 1], c='orange', label=iris.target_names[2])
plt.title('PCA of IRIS dataset')
plt.legend()
plt.show()
# 越近越相似,很适合KNN等聚类模型

# 查看新特征向量的信息量大小(可解释性方差)
print(pca.explained_variance_)
# 查看新特征向量的信息量占原始数据信息量的百分比(可解释性方差贡献率)
print(pca.explained_variance_ratio_)
# 降维后保留的信息量
print(pca.explained_variance_ratio_.sum())
''' [4.22824171 0.24267075] [0.92461872 0.05306648] 0.9776852063187949 '''
累积可解释方差贡献率曲线
pca_line = PCA().fit(X)
print(pca_line.explained_variance_ratio_)
[0.92461872 0.05306648 0.01710261 0.00521218]
''' 降到1维保留0.92461872, 降到2维保留0.92461872+0.05306648 降到4维保留1 array([0.92461872, 0.05306648, 0.01710261, 0.00521218]) '''
import numpy as np
# np.cumsum(pca_line.explained_variance_ratio_) # 累加
plt.plot([1,2,3,4], np.cumsum(pca_line.explained_variance_ratio_))
plt.xticks([1,2,3,4]) # 限制坐标轴显示整数
plt.xlabel('number of components after dimention reduction')
plt.ylabel('cumulative explained variance ratio')
plt.show()

n_components=‘mle’ 自选
# 最大似然估计(mle)自选超参数
# 计算量大
pca_mle = PCA(n_components='mle')
pca_mle = pca_mle.fit(X)
X_mle = pca_mle.transform(X)
print(pca_mle.explained_variance_ratio_)
[0.92461872 0.05306648 0.01710261]
# 可见降至3维最佳
按信息量占比选超参数 SVD
''' 可通过尝试,选定要降至的维度 规定降维后信息保留0.97,添加svd_solver='full' svd奇异值分解(solver求解器) auto 自动按数据量选择full或randomized full 适合数据量不大 randomized 适合特征矩阵巨大 arpack (一般auto,算不出来randomized) '''
pca_f = PCA(n_components=0.97, svd_solver='full')
pca_f = pca_f.fit(X)
X_f = pca_f.transform(X)
print(pca_f.explained_variance_ratio_)
[0.92461872 0.05306648]
边栏推荐
- Data annotation learning summary
- Markov random field: definition, properties, maximum a posteriori probability problem, energy minimization problem
- 1. There is a fractional sequence: 2/1, 3/2, 5/3, 8/5, 13/8,... Program to sum the first 20 items of this sequence.
- Sword finger offer special assault edition day 7
- Infineon launched the world's first TPM security chip with post quantum encryption technology for firmware update
- Ia notes 2
- Icml2022 | rock: causal reasoning principle on common sense causality
- 1. Pedestrian recognition based on incremental occlusion generation and confrontation suppression
- What is the sandbox technology in the data anti disclosure scheme?
- Pointer learning diary (IV) use structure and pointer (linked list)
猜你喜欢

Wang Qing, director of cloud infrastructure software research and development of Intel (China): Intel's technology development and prospects in cloud native

Zhaoyi innovation gd25wdxxk6 SPI nor flash product series comes out

反射的介绍

Fiddler抓包工具的使用

MGRE and OSPF comprehensive experiment

High performance architecture design of wechat circle of friends

Echo speaker pairing and operation method
![[advanced mathematics] the difference between differentiable and differentiable functions](/img/32/ead52f0d451e3c07a22c7a107fc8b9.jpg)
[advanced mathematics] the difference between differentiable and differentiable functions

BeanShell built-in variable CTX

How can NFT, whose stars enter the market against the market, get out of the independent market?
随机推荐
Problems encountered in configuring Yum source
postgresql:在Docker中运行PostgreSQL + pgAdmin 4
Jetson device failed to download repository information use tips to record
Career planning route
股票价格走势的行业关联性
C table data De duplication
PSO and mfpso
Mrs +apache Zeppelin makes data analysis more convenient
2. Input a circle radius r, when r > = 0, calculate and output the area and perimeter of the circle, otherwise, output the prompt information.
The difference between compiled language and interpreted language
Teach you how to weld CAD design board bottom (for beginners) graphic tutorial
Tips for using the built-in variable vars in BeanShell
power. The operation is in the low peak period of business. Import call will help you prepare each word
XML schema
EMQX 简单使用
DHCP principle and configuration
FTP file transfer protocol
Optional consistency
Token of space renewable energy
It is related to the amount of work and ho. Embedded, only one 70 should be connected