当前位置:网站首页>Brief introduction of [data mining] cluster analysis
Brief introduction of [data mining] cluster analysis
2022-07-24 05:42:00 【hongdi】
Catalog
One 、 What is cluster analysis ?
Two 、 Importance of cluster analysis
3、 ... and 、 The types of clustering algorithms
( One ) Based on partition clustering algorithm
( Two ) Based on hierarchical clustering algorithm
( 3、 ... and ) Based on density clustering algorithm
( Four ) Grid based clustering algorithm
( 5、 ... and ) Clustering algorithm based on neural network
( 6、 ... and ) Clustering algorithm based on statistics
Four 、 Application of cluster analysis
One 、 What is cluster analysis ?
Clustering analysis refers to the analysis process of grouping a collection of physical or abstract objects into several classes composed of similar objects , Its purpose is to collect data on a similar basis to classify .

Clustering is similar to classification , But different from the purpose of classification , It is to divide a group of data into several categories according to the similarity and difference of data . There is a great similarity between data belonging to the same category , But the data similarity between different categories is very small , Cross class data association is very low . The difference between clustering and classification lies in , The class required by clustering is unknown .
Two 、 Importance of cluster analysis
“ Birds of a feather flock together , Birds of a feather flock together ”, This is the basic ability of human beings to understand the world and Society for thousands of years , It is a universality that we must face to find value from big data 、 Basic questions , Is cognitive science as “ Discipline of discipline ” The first problem to be solved .
Whether it's politics 、 economic 、 literature 、 history 、 social 、 Culture 、 Or Mathematics 、 chemical 、 Medical agriculture 、 traffic 、 Geography 、 Big data from all walks of life or any macro or micro value discovery , With the help of big data clustering analysis , therefore , The primary problem of data analysis and mining is clustering , This clustering is interdisciplinary 、 Cross domain 、 Cross media . Big data clustering is the foundation of data intensive Science 、 The question of universality .
It's no exaggeration to say , If the clustering algorithm is confused , Or no “ to ground ” Of “ example ”, It's just a hoax to say that you are engaged in data mining .
If human cognitive science wants to make a breakthrough , First, we need to make a breakthrough in big data clustering , Clustering is the first step in mining the value of big data assets .
3、 ... and 、 The types of clustering algorithms
As a very active research field in data mining , There are many algorithms for cluster analysis .
( One ) Based on partition clustering algorithm
1、k-means: Is a typical partition clustering algorithm , It uses a cluster center to represent a cluster , That is to say, the cluster point selected in the iterative process is not necessarily a point in the cluster , This algorithm can only deal with numerical data
2、k-modes:K-Means The extension of the algorithm , A simple matching method is used to measure the similarity of different types of data
3、k-prototypes: Combined with the K-Means and K-Modes Two algorithms , Able to handle mixed data
4、k-medoids: In the iterative process, a point in the cluster is selected as the aggregation point ,PAM Is a typical k-medoids Algorithm
5、CLARA:CLARA Algorithm in PAM On the basis of the sampling technique , Able to handle large scale data
6、CLARANS:CLARANS The algorithm converges PAM and CLARA Advantages of both , Is the first clustering algorithm for spatial databases
7、Focused CLARAN: Using spatial index technology to improve CLARANS The efficiency of the algorithm
8、PCM: Fuzzy set theory is introduced into cluster analysis and put forward PCM Fuzzy clustering algorithm
( Two ) Based on hierarchical clustering algorithm
1、CURE: Sampling technique is used to analyze the data set first D Random sampling , Then partition the samples with Partition Technology , Then local clustering for each partition , Finally, the local clustering is clustered globally
2、ROCK: Random sampling technology is also used , When calculating the similarity between two objects , At the same time, the influence of surrounding objects is considered
3、CHEMALOEN( Chameleon algorithm ): First, the data set is constructed into a K- Nearest neighbor Gk , Then the graph is divided into Gk Divide into a large number of subgraphs , Each subgraph represents an initial sub cluster , Finally, a condensed hierarchical clustering algorithm is used to anti compound and merge sub clusters , Find the real result cluster
4、SBAC:SBAC The algorithm is used to calculate the similarity between objects , Considering the importance of attribute characteristics to reflect the essence of the object , Give a higher weight to the attribute that can better reflect the essence of the object
5、BIRCH:BIRCH The algorithm uses tree structure to process the data set , Leaf nodes store a cluster , Expressed by center and radius , Process each object in sequence , And divide it into the nearest node , This algorithm can also be used as the preprocessing process of other clustering algorithms
6、BUBBLE:BUBBLE The algorithm puts BIRCH The concept of center and radius of the algorithm is extended to ordinary distance space
7、BUBBLE-FM:BUBBLE-FM The algorithm reduces the number of distance calculations , Improved BUBBLE The efficiency of the algorithm
( 3、 ... and ) Based on density clustering algorithm
1、DBSCAN:DBSCAN The algorithm is a typical density based clustering algorithm , The algorithm uses spatial index technology to search the neighborhood of the object , Introduced “ The core object ” and “ Density can reach ” And so on , Starting from the core object , All the objects with density can be grouped into a cluster
2、GDBSCAN: Algorithm through generalization DBSCAN The concept of neighborhood in Algorithm , To adapt to the characteristics of spatial objects
3、OPTICS:OPTICS The algorithm combines the automaticity and interactivity of clustering , In the order of clustering , Different parameters can be set for different clusters , To get users' satisfactory results
4、FDC:FDC The algorithm constructs k-d tree Divide the whole data space into several rectangular spaces , When the space dimension is small, it can be greatly improved DBSCAN The efficiency of
( Four ) Grid based clustering algorithm
1、STING: Use grid cells to save data statistics , So as to achieve multi-resolution clustering
2、WaveCluster: The principle of wavelet transform is introduced into cluster analysis , It is mainly used in the field of signal processing .( remarks : Wavelet algorithm in signal processing , Graphic and image , It has important applications in fields such as encryption and decryption , It's a kind of profound and awesome thing )
3、CLIQUE: It is a clustering algorithm that combines grid and density
( 5、 ... and ) Clustering algorithm based on neural network
1、 Self organizing neural network SOM: The basic idea of this method is -- Input different samples from the outside to the artificial self-organizing mapping network , At the beginning , The location of the output excited cells caused by the input sample varies , But some cell groups will be formed after self-organization , They represent the input samples , It reflects the characteristics of the input sample
( 6、 ... and ) Clustering algorithm based on statistics
1、COBWeb:COBWeb It is a general concept clustering method , It uses the form of classification tree to express hierarchical clustering
2、AutoClass: It is based on probability mixed model , Use the probability distribution of attributes to describe clustering , This method can deal with mixed data , But each attribute is required to be independent
Cluster analysis is an exploratory analysis , In the process of classification , People don't have to give a classification in advance , Cluster analysis can start from sample data , Automatic classification . Different methods are used in cluster analysis , Different conclusions are often drawn . Different researchers cluster the same set of data , The number of clusters obtained may not be consistent .
Four 、 Application of cluster analysis
1、 business
Cluster analysis is used to find different customer groups , And describe the characteristics of different customer groups through purchase patterns . Cluster analysis is an effective tool for market segmentation , It can also be used to study consumer behavior , Look for new potential markets 、 Choose the market for the experiment , And as a pretreatment of multivariate analysis .
2、 Electronic Commerce
Clustering analysis is also a very important aspect in data mining of website construction in e-commerce , Cluster customers with similar browsing behavior by grouping , And analyze the common characteristics of customers , It can better help e-commerce users understand their customers , Provide more appropriate services to customers .
The relevant knowledge of this article comes from the network .
边栏推荐
- 【虚拟化】如何将虚拟机从workstation转换到esxi
- Create a new UMI project, error: rendered more hooks or rendered fewer hooks
- Restore UI design draft
- Similarities and differences of ODS, data mart and data warehouse
- 波卡生态发展不设限的奥义——多维解读平行链
- 【activiti】activiti环境配置
- flink checkpoint配置详解
- 仿某网站百度地图页面 百度API
- Cess test online line! The first decentralized storage network to provide multiple application scenarios
- 【activiti】流程实例
猜你喜欢

【activiti】activiti介绍

Polkadot | 一文解读颠覆传统社媒的Liberty计划如何在波卡落地

How to forcibly uninstall Google browser? Don't worry about Google opening as a whiteboard. It's effective for personal testing.

Canvas - round

GeoServer automatically uploads shapefiles

Flink 生产环境配置建议

【mycat】mycat搭建读写分离
![利用流媒体将RSTP流转成WEB端播放(二)[可回看]](/img/b9/2c0e6eb19acaa2356ff49f6e272826.png)
利用流媒体将RSTP流转成WEB端播放(二)[可回看]

Node connects to MySQL and uses Navicat for visualization

Flink 时间流处理
随机推荐
MySQL的分页你还在使劲的limit?
关于DAO流动性双币质押挖矿开发原理分析
Polkadot | 一文解读颠覆传统社媒的Liberty计划如何在波卡落地
音乐 NFT 为什么火了?Polkadot 或将成为发展音乐 NFT 的最佳选择
【vsphere高可用】主机出现故障或隔离后的处理
读《悟道:一位IT高管20年的职场心经》
Substrate 技术及生态5月大事记 | Square One 计划启动,波卡上线 XCM!
微信小程序返回携带参数或触发事件
微信小程序map的使用
Function analysis of GeoServer rest API
Canvas - rotate
Creation and generation of SVG format map in Heilongjiang Province
mysql数据库的授权访问
稀缺性之于Web3:如何成为去中心化世界的胜利者
仿某网站百度地图页面 百度API
盘点波卡生态潜力项目 | 跨链特性促进多赛道繁荣
【mycat】mycat相关概念
Public chain Sui layer1 network
Gavin wood, founder of Poka: what will happen to Poka governance V2?
【activiti】activiti介绍