当前位置:网站首页>Summary of wuenda's machine learning course (14)_ Dimensionality reduction
Summary of wuenda's machine learning course (14)_ Dimensionality reduction
2022-06-28 00:21:00 【51CTO】
Q1 Motive one : data compression
Dimension reduction of features , Such as reducing the relevant two-dimensional to one-dimensional :

Three dimensional to two dimensional :

And so on 1000 Dimension data is reduced to 100 D data . Reduce memory footprint
Q2 Motivation two : Data visualization
Such as 50 The data of dimensions cannot be visualized , The dimension reduction method can reduce it to 2 dimension , Then visualize .
The dimension reduction algorithm is only responsible for reducing the dimension , The meaning of the new features must be discovered by ourselves .
Q3 Principal component analysis problem
(1) Problem description of principal component analysis :
The problem is to make n Dimensional data reduced to k dimension , The goal is to find k Vector , To minimize the total projection error .
(2) Comparison between principal component analysis and linear regression :

The two are different algorithms , The former is to minimize the projection error , The latter is to minimize the prediction error ; The former does not do any analysis , The latter aims to predict the results .
Linear regression is a projection perpendicular to the axis , Principal component analysis is a projection perpendicular to the red line . As shown in the figure below :

(3)PCA It's a new one “ Principal component ” The importance of vectors , Go to the front important parts as needed , Omit the following dimension .
(4)PCA One of the advantages of is the complete dependence on data , There is no need to set parameters manually , Independent of the user ; At the same time, this can also be seen as a disadvantage , because , If the user has some prior knowledge of the data , Will not come in handy , May not get the desired effect .
Q4 Principal component analysis algorithm
PCA take n Dimension reduced to k dimension :
(1) Mean normalization , Minus the mean divided by the variance ;
(2) Calculate the covariance matrix ;
(3) Compute the eigenvector of the covariance matrix ;

For one n x n The matrix of dimensions , On the type of U It is a matrix composed of the direction vector with the minimum projection error with the data , Just go to the front k Vectors get n x k The vector of the dimension , use Ureduce Express , Then the required new eigenvector is obtained by the following calculation z(i)=UTreduce*x(i).
Q5 Choose the number of principal components
Principal component analysis is to reduce the mean square error of projection , The variance of the training set is :

Hope to reduce the ratio of the two as much as possible , For example, I hope the ratio of the two is less than 1%, Select the smallest dimension that meets this condition .

Q6 Reconstructed compressed representation
Dimension reduction formula :

The reconstruction ( That is, from low dimension to high dimension ):

The schematic diagram is shown below : The picture on the left shows dimension reduction , The picture on the right is reconstruction .

Q7 Suggestions on the application of principal component analysis
Use the case correctly :
100 x 100 Pixel image , namely 10000 Whitman's sign , use PCA Compress it to 1000 dimension , Then run the learning algorithm on the training set , In the prediction of the , Apply what you learned before to the test set Ureduce Will test the x convert to z, Then make a prediction .
Incorrect usage :
(1) Try to use PCA To solve the problem of over fitting ,PCA Cannot solve over fitting , It should be solved by regularization .
(2) By default PCA As part of the learning process , In fact, the original features should be used as much as possible , Only when the algorithm runs too slowly or takes up too much memory, the principal component analysis method should be considered .
author : Your Rego
The copyright of this article belongs to the author , Welcome to reprint , But without the author's consent, the original link must be given on the article page , Otherwise, the right to pursue legal responsibility is reserved .
边栏推荐
- 计数质数[枚举 -> 空间换时间]
- [读书摘要] 学校的英文阅读教学错在哪里?--经验主义和认知科学的PK
- 安全省油環保 駱駝AGM啟停電池魅力十足
- MySQL enterprise parameter tuning practice sharing
- After a period of silence, I came out again~
- 每次启动项目的服务,电脑竟然乖乖的帮我打开了浏览器,100行源码揭秘!
- MySQL read / write separation configuration
- Feign通过自定义注解实现路径的转义
- VirtualBox extended dynamic disk size pit
- 剑指 Offer 65. 不用加减乘除做加法
猜你喜欢

2022 PMP project management examination agile knowledge points (3)

剑指 Offer 61. 扑克牌中的顺子

MATLB|基于复杂网络的配电系统微电网优化配置

Flutter series: Transformers in flutter

How to quote Chinese documents when writing a foreign language?
![软件工程作业设计(1): [个人项目] 实现一个日志查看页面](/img/95/0c3f0dde16d220ddecb5758a4c31e7.png)
软件工程作业设计(1): [个人项目] 实现一个日志查看页面

Character interception triplets of data warehouse: substrb, substr, substring

Matlb| optimal configuration of microgrid in distribution system based on complex network

Msp430f5529 MCU reads gy-906 infrared temperature sensor

数仓的字符截取三胞胎:substrb、substr、substring
随机推荐
flutter系列之:flutter中的变形金刚Transform
Scu| gait switching and target navigation of micro swimming robot through deep reinforcement learning
单片机之IIC通信协议「建议收藏」
吴恩达《机器学习》课程总结(13)_聚类
Recyclerview implements grouping effects in a variety of ways
吴恩达《机器学习》课程总结(14)_降维
It supports deleting and updating the priority queue of any node
炼金术(9): 简约而不简单,永不停歇的测试 -- always_run
Translation (4): matching rules for automatic text completion
Code neatness -- format
Quickly master grep commands and regular expressions
Flutter series: Transformers in flutter
Msp430f5529 MCU reads gy-906 infrared temperature sensor
Using two stacks to implement queues [two first in first out is first in first out]
[untitled]
MySQL分表查询之Merge存储引擎实现
Customize MySQL connection pool
ValidateRequest=”false” 是做什么的「建议收藏」
Alchemy (4): mental model of programmers
Arduino uno realizes simple touch switch through direct detection of capacitance