当前位置:网站首页>[basic data mining technology] KNN simple clustering
[basic data mining technology] KNN simple clustering
2022-07-24 20:28:00 【Sunny qt01】
KNN Clustering technology

The picture shows age and income , Will you buy magazines
KNN Is to choose one K As the radius of , Circle with sample as origin , If there are more categories in the circle , Then we will divide the sample into this category .K Is a super parameter , Because we are sure .
KNN Theoretical basis : Customers in the same cluster will show the same behavior .
So the cluster is the same as the adjacent customers , It is not a machine learning method
Inferiority : inefficiency , Because I'm not sure K So try many times .
It is difficult to explain why KNN Clustering effect will be better than naïve prediction Good prediction .
KNN And Naïve Prediction Result probability comparison :

We found that the correct probability is indeed much higher .
Practical application KNN Of 3 A step
Data preprocessing (Data Preprocessing) Guaranteed attributes (age vs Income) The measurement ( The proportion )scale No problem . Remember to standardize
Calculation of distance (Distance Caculation): Choose which distance calculation formula
Calculation of prediction probability (Predicted Probability)
step 1: Standardization
We use extreme positive programming here (Min-max Normalization)【0,1】

The transformation effect is as follows :

step 2 Several distance calculations :
Manhattan distance ( First power ), This is the street distance , Not a straight distance

Among them R The formula is as follows :

Euclidean distance ( A quadratic )
It's our classic distance formula , Linear distance

The difference between the two is p The final formula of the control is

When p be equal to 1 Manhattan distance
When p be equal to 2 Time is European distance
Python In the accessories, it is to change p To change the distance formula
step 3:
For example, there is a score 3 Class distance , The test data sample is T,k=5. give the result as follows
The latest goal is A class
The second most recent target attribute is B class
The third recent target attribute is A class
The fourth recent target attribute is C class
The fifth recent target attribute is A class
Then we predict that the target attribute value is A, Accuracy rate is 3/5
Case study 1 Give diagnostic data for the following diseases , Field 1 is the patient code , The following input fields ( sore throat 、 Have a fever 、 Swollen lymph glands 、 congestion , Have a headache ) And the target field ( The diagnosis )

utilize KNN Predict the diagnostic results of the following patients (K=3)
Distance(Yes,No)=1
DIStance(YES,YES)=0
Distance(No,No)=0
Two customers distance The calculation method adopts interception distance
边栏推荐
- [training Day6] triangle [mathematics] [violence]
- What should Ali pay attention to during the interview? Personal account of Alibaba interns who passed five rounds of interviews
- How does starknet change the L2 landscape?
- Alibaba sentinel basic operation
- Pychart tutorial: 5 very useful tips
- [training Day10] linear [mathematics] [thinking]
- Do you want to verify and use the database in the interface test
- Lunch break train & problem thinking: thinking about the problem of converting the string formed by hour: minute: second to second
- Applet wonderful bug update~
- Are network security and data security indistinguishable? Why is data security important?
猜你喜欢
![[training Day8] [luogu_p6335] staza [tarjan]](/img/cf/e2027549c56b8597e7cd579d737392.png)
[training Day8] [luogu_p6335] staza [tarjan]

C# 窗体应用TreeView控件使用

Apache atlas version 2.2 installation

Install MySQL 5.7.37 on windows10
![[msp430g2553] graphical development notes (2) system clock and low power consumption mode](/img/4e/c08288c3804d3f1bcd5ff2826f7546.png)
[msp430g2553] graphical development notes (2) system clock and low power consumption mode

Risk control system, implemented by flink+clickhouse!

Valdo2021 - vascular space segmentation in vascular disease detection challenge (I)

Leetcode 1911. maximum subsequence alternating sum
![[training Day10] linear [mathematics] [thinking]](/img/bf/0082dbe88c579bbb7adc014c60a0be.png)
[training Day10] linear [mathematics] [thinking]

English grammar_ Demonstrative pronoun this / these / that / those
随机推荐
(posted) differences and connections between beanfactory and factorybean
Get the current time in go language, and the simple implementation of MD5, HMAC, SHA1 algorithms
Mysql8 doesn't seem to support MyISAM partition tables. Does polardb-x support MyISAM partition tables?
Opengl rendering pipeline
Docker builds redis and clusters
[msp430g2553] graphical development notes (1) configuration environment
Implementation of OA office system based on JSP
[training Day9] rotate [violence] [thinking]
Are network security and data security indistinguishable? Why is data security important?
API data interface of A-share transaction data
How to set the allure test report
OpenGL (1) vertex buffer
vlan技术
[training Day8] tent [mathematics] [DP]
1. Mx6u-alpha development board (buzzer experiment)
Open source demo | release of open source example of arcall applet
MySQL stored procedure
YouTube "label products" pilot project launched
Inconsistent time
Write a batch and start redis