当前位置:网站首页>Correlation analysis related knowledge
Correlation analysis related knowledge
2022-06-26 14:12:00 【Orange tea must be ^ -^】
Correlation analysis is to analyze the relationship between characteristic data , If there is a positive correlation 、 negative correlation 、 It's all about 、 Not entirely relevant 、 And modeling and forecasting with mathematical models .
Illustrate with examples : Analyze the correlation between cost data and advertising exposure
1. Covariance and covariance matrix
covariance Cov(X,Y) It describes the relationship between two components of a two-dimensional random variable Degree of correlation A characteristic number of , set up (X ,Y) It's a two-dimensional random variable , if E{ [ X-E(X) ] [ Y-E(Y) ] } There is , We call it Mathematical expectation by X And Y The covariance , And remember to Cov(X,Y)=E{ [ X-E(X) ] [ Y-E(Y) ] },
According to the covariance, the relationship between two eigenvectors can be calculated , If cov(X,Y) Values that are regular are positively correlated , If no negative , It's a negative correlation , by 0, Is irrelevant . When there are more than two eigenvectors , The covariance matrix can be used to calculate the correlation easily . When you want a simple solution , It can be used directly excel Medium COVAR Function directly finds .
Covariance can only be used for correlation analysis of two groups of data , When there are more than two sets of data, we need to use the covariance matrix . Here are three sets of data x,y,z, Calculation formula of covariance matrix of .
When there are many types of characteristic data , Covariance and covariance matrix cannot be used to calculate which groups of data have changed in correlation , It can only roughly calculate whether the correlation is positive or negative , To compare the magnitude of the correlation is , The method of correlation coefficient should be used for comparison .
2. The correlation coefficient
The correlation coefficient is a statistical index that reflects the closeness of the relationship between variables ,1 Indicates that two variables are completely linearly correlated ,-1 Indicates that the two variables are completely negatively correlated ,0 Indicates that two variables are not related . The data is getting closer to 0 The weaker the correlation is . The calculation formula is as follows :
among rxy Indicates the sample correlation coefficient ,Sxy Represents the sample covariance ,Sx Express X The sample standard deviation of ,Sy Express y Sample standard deviation of . Here are Sxy Covariance and Sx and Sy The formula for calculating the standard deviation . Because it is the sample covariance and the sample standard deviation , So the denominator is n-1.
Sxy Sample covariance formula calculation :
Sx Calculation formula of sample standard deviation :
Sy Calculation formula of sample standard deviation :
3. Univariate regression and multivariate regression
There are two preparations before regression analysis , First, determine the number of variables . Second, determine the independent variable and dependent variable . The following is the univariate regression equation , among y Indicates advertising exposure ,x Indicates the cost of expenses .b0 Is the intercept of the equation ,b1 Is the slope , It also shows the relationship between the two variables . Our goal is b0 and b1 Value , Knowing these two values will also know the relationship between variables . And we can use this relationship to predict the advertising exposure when the cost is known .
This is a b1 Calculation formula , We pass the known expense cost x And advertising exposure y To calculate b1 Value .
Here are b0 Calculation formula , In known b1 And the mean value of independent variable and dependent variable ,b0 The value of is easy to calculate .
stay Excel Use the regression function in data analysis , After inputting the range of independent variable and dependent variable, you can automatically obtain b0(Intercept) Value 362.15 and b1 Value 5.84. there b0 There is some difference from the value obtained by manual calculation before , Because the previous calculation b1 The value retains only two decimal places .
Here is a separate explanation R Square Value 0.87. This value is called the decision coefficient , Used to measure the goodness of fit of regression equation . The bigger this is , The more meaningful the regression equation is , The higher the explanatory degree of the independent variable to the dependent variable .
4. Information entropy
Calculate the information entropy of the data , We can get the entropy between the data , The greater the entropy , Explain that the greater the uncertainty , The smaller the probability of occurrence , The lower the entropy , Explain that the greater the certainty , The more times it appears , The greater the probability , The more relevant .
边栏推荐
- 8. Ribbon load balancing service call
- Is it safe to open a securities account? Is there any danger
- C language ---getchar() and putchar()
- 虫子 类和对象 上
- Stream常用操作以及原理探索
- 虫子 STL string 下 练习题
- Build your own PE manually from winpe of ADK
- Introduction to 26 papers related to CVPR 2022 document image analysis and recognition
- [hcsd application development training camp] one line of code second cloud evaluation article - experience from the experiment process
- 去某东面试遇到并发编程问题:如何安全地中断一个正在运行的线程
猜你喜欢
程序员必备,一款让你提高工作效率N倍的神器uTools
服务器创建虚拟环境跑代码
Gurivat sprint Harbour Exchange listed: created “multiple first”, received 900 million yuan Investment from IDG capital
[MySQL from introduction to mastery] [advanced part] (II) representation of MySQL directory structure and tables in the file system
Win10 home vs pro vs enterprise vs enterprise LTSC
李航老师新作《机器学习方法》上市了!附购买链接
Network remote access using raspberry pie
AGCO AI frontier promotion (6.26)
A must for programmers, an artifact utools that can improve your work efficiency n times
Wechat applet Registration Guide
随机推荐
9项规定6个严禁!教育部、应急管理部联合印发《校外培训机构消防安全管理九项规定》
Detailed sorting of HW blue team traceability process
Common operation and Principle Exploration of stream
7-1 range of numbers
[ahoi2005] route planning
虫子 类和对象 上
Stream常用操作以及原理探索
同花顺股票开户选哪个证券公司是比较好,比较安全的
GEE——全球人类居住区网格数据 1975-1990-2000-2014
[wc2006] director of water management
Niuke challenge 48 e speed instant forwarding (tree over tree)
古瑞瓦特冲刺港交所上市:创下“多个第一”,获IDG资本9亿元投资
[hcsd application development training camp] one line of code second cloud evaluation article - experience from the experiment process
CVPR 2022文档图像分析与识别相关论文26篇汇集简介
Gartner 2022 Top Strategic Technology Trends Report
Bug memory management
Linear basis
windows版MySQL软件的安装与卸载
创建一个自己的跨域代理服务器
Mathematical design D12 according to string function