当前位置:网站首页>[advanced data mining technology] Introduction to advanced data mining technology
[advanced data mining technology] Introduction to advanced data mining technology
2022-07-24 20:29:00 【Sunny qt01】
Functional classification of data mining technology
Descriptive data mining (Descriptive Data Mining(Unsupervised Learning Unsupervised learning , No target value is required ))
Association Rules( Association rules )
Find out which events often occur together ,
Example : Amazon Amazon.com

People who buy these two books will generally buy the following books , According to past purchase records , Find the connection . Most e-commerce companies have recommendations , The technology behind it is related technology .
This technology is to find those events that happen at the same time .( Buy two books at the same time )
Generally used in e-commerce , Virtual store , You can make recommendations directly in the purchase process ( Just like the battery and flashlight in front of each other )
Algorithm :
Apriori
FP-Growth
Sequential Patterns( Sequence pattern )
Find out which events often appear in sequence

This technology is to find out what products will be purchased again after purchasing a certain product . There is a time order relationship
Generally used in physical stores , Like the bookstore above , After paying for the book , You can predict the goods you may buy next time , Print out this type of discount coupon , Attract customers to buy again , Or consumption . Because the sales are not at the same time ,up_selling. There are recommendations in chronological order
Algorithm :
ApriorAll
Cluster Analysis( Clustering analysis )
Find out the internal structure between data
Most of the fields of customers of the same type will be the same , Fields of different classes differ greatly

The above figure is a bank that hopes to cluster the investment propensity of consumers by income and age .
One point represents one customer , If the points are dense and concentrated, a group will be formed . Can we see that there is 3 Species group . We can make some corresponding marketing strategies for different customer groups
The first group has low income , Middle and upper age ( Investigation found , It may be workers , High risk , Insurance Strategy , The insurance cost is low .)
The second group has a high income , Older ( Old and golden , Prefer capital guaranteed investment )
The third group has a medium to high income , Younger ( Rich second generation , Tend to high-yield and high-risk investment , The risk tolerance is relatively large )
Algorithm :
Hierarchical clustering :Single Linkage,Aberage Linkage,Complete linkage
Separate clusters :K-means
Kohonen self Organizing Maps(SOM)
Two-Step First determine the number of clusters , How to get a group ( You can get several groups )
Predictive data mining (Predictive Data Mining(Supervised Learning Supervised learning ))
Classification( classification )
The category to which the forecast data belongs
You need to input variables , Target variables are also required .
Predict whether customers are interested in car models
Customer number Input properties Result properties

Input Attributes Input factors :Car( The type of car you bought ) Age( Age ) Children( There are several children )
New customers :car=sedan Age=35 To make predictions See if you have any preferences
The result variable must be a categorical variable
Algorithm :
Bayes Net
Logistic Regreession ( Logical regression )
Decision Tree ( Decision tree )
Neural Network( neural network )
Support Vector Machine( Support vector machine )
Knearest Neighborhood(K- Nearest neighbor )
Prediction( forecast )
The value corresponding to the predicted data
There should be input variables ( reason ), The result field must be numeric
For example, predict the annual revenue of customers ,

Location, The location of the house Type Type of house Miles And school location SF House size CM How many houses in the community
forecast
Input field Location=Rural Miles=3 SF=1500
Result fields :Home Price
Linear Regression( Linear regression )
Time Series( The time series )
Decision Tree( Decision tree )*
Neural Network( neural network )*
Support Vector Machine( Support vector machine )*
K-Nearest Neighborhood(K- Nearest neighbor )*
- Introduction to data mining related websites -KDnugets&Kaggle
KDnugets: Can provide a lot of things , There are data sets , The latest information will be sent to you after registration
Data UCI The data is easy to use . Data sets from all walks of life ,
Kaggle: A bridge between enterprises and data scientists , Enterprises can send data to scientists for analysis .
There are enterprise requirements

Enterprises hand over demand , Compete in the form of competition . It can provide a large amount of data at the enterprise level . It's very large Valued Shoppers( value ), More authentic data .
Data Castle, kordsa Kesci( There is training )( China's Kaggle) It also provides data
Positioning of data mining 、 expectation 、 With the establishment of data mining team

Machines replace labor , The first industrial revolution
Data mining middle managers , Will be replaced by Intelligent Automation
Prospects of data mining :
Time magazine lists data mining as 21 One of the five emerging industries in the 21st century , Data mining is of great importance in business
The future marketing focus will shift from products to customers
- Customers may be robbed by better services from competitors
- Whoever has the most knowledge about customers has the most capital
- Know more about customers , The more we can deepen the uniqueness of the brand , The competition is stronger
Only by converting data into knowledge , Knowledge becomes action , To turn action into profit
Data mining and fortune telling
Data Mining data mining ( Modern fortune telling )
Data:Attributes
Algorithm:Classification,Clustering,Association,…( Various algorithms )
Predict Future Trends( Future trends )
Fortune-telling Fortune-telling
Data: The eight words of birth , Face , Palm ,…
Algorithm: Ziwei's number , Four column tweet …
Predict Future fortune
Data mining can predict the future , Reduce fear .
How to carry out data mining :
Short term data mining requires tools , Long term data mining requires self-development
The team's best case group needs 3 people of the same race
One kind of person is Project Manager( managers )
One kind of person is CRM(Data Mining) People who ( data mining )
One kind of person is IT(Database) People who ( The person who provides the data )
Data analysis certification process :
Data Miner1. Given field Attributes And data Data, It works DATA Mining Tools Tools to analyze , Get the results Mining Results
Data Analyzer2. Given field Attributes( Influencing factors ), Can operate database Databases,DATA Mining Tools Tool analysis can get and interpret results Mining Results
Business Analyzer3. Given the subject , Can operate database Databases, Tools to analyze DATA Mining Tools Get and interpret the results Mining Results
边栏推荐
- Analysis of xmldecoder parsing process
- Solve the problem that gd32f207 serial port can receive but send 00
- [trial experience of Yuxin micro Wiota ad hoc network protocol development kit] RT thread BSP Software package production
- Get the current time in go language, and the simple implementation of MD5, HMAC, SHA1 algorithms
- Processing of null value of Oracle notes
- Azide labeled PNA peptide nucleic acid | methylene blue labeled PNA peptide nucleic acid | tyrosine modified PNA | Tyr PNA Qiyue Bio
- Solve the problem of error l6218e undefined symbol XXX
- Leetcode 48 rotating image (horizontal + main diagonal), leetcode 221 maximum square (dynamic programming DP indicates the answer value with ij as the lower right corner), leetcode 240 searching two-d
- [training Day8] interesting number [digital DP]
- Work notes - some problems encountered when using jest
猜你喜欢

The difference between map and flatmap in stream

C form application treeview control use

Make Huawei router into FTP server (realize upload and download function)

Opengl rendering pipeline

YouTube "label products" pilot project launched

Pix2seq: Google brain proposes a unified interface for CV tasks!

Understand the domestic open source Magnolia license series agreement in simple terms

Implementation of OA office system based on JSP

Leetcode 48 rotating image (horizontal + main diagonal), leetcode 221 maximum square (dynamic programming DP indicates the answer value with ij as the lower right corner), leetcode 240 searching two-d

Istio二之流量劫持过程
随机推荐
What is IDE (integrated development environment)
Upgrade appium automation framework to the latest 2.0
Leetcode 146: LRU cache
How to set appium script startup parameters
Leetcode 1928. minimum cost of reaching the destination within the specified time
Redisgraph graphic database multi activity design scheme
Each blogger needs to ask himself seven basic questions
Near infrared dye cy7.5 labeling PNA polypeptide experimental steps cy7.5-pna|188re labeling anti gene peptide nucleic acid (agpna)
Are network security and data security indistinguishable? Why is data security important?
Valdo2021 - vascular space segmentation in vascular disease detection challenge (2)
[shader realizes the flicker effect of three primary colors of television signal _shader effect Chapter 5]
VLAN Technology
Safe way -- Analysis of single pipe reverse connection back door
Do you want to enroll in a training class or study by yourself?
Framework API online viewing source code
Modbus communication protocol specification (Chinese) sharing
Chrome realizes automated testing: recording and playback web page actions
Modulenotfounderror: no module named 'pysat.solvers' (resolved)
"Hualiu is the top stream"? Share your idea of yyds
How to view the execution plan of a stored procedure in Youxuan database