当前位置:网站首页>Data analysis (I)
Data analysis (I)
2022-07-23 12:19:00 【Emperor Confucianism is supreme】
— Basic knowledge points of data analysis , Later, I will share my practical projects
One . Why do data analysis
1. What is? ” Big data era “
“ big data ” The concept of times was first proposed by McKinsey, a world-famous consulting company . McKinsey said :“ Data has penetrated into every industry and business function field today , And has become an important factor of production . With a new round of productivity growth and the arrival of a wave of consumer surpluses , The mining and use of massive data indicates “ big data ” Already exists in Physics , biology , Environmental ecology and military , Finance , Communications and other industries .
Big data is an inevitable product of the development of the Internet to a certain stage , It is also the embodiment of the value of the Internet . As more and more social resources are networked and digitized , The value that big data can carry will also be constantly mentioned and improved , The application scope of big data will also continue to expand . therefore , In the future Internet age , Big data itself can not only represent value , And big data itself can also create value .
2. What big data can do
(1) Take entities such as people 、 Cars, etc. are converted into virtual tags , The combination of tag data represents different individuals . for example : Blue 、 Excellent performance 、 Cost effective XX vehicle . Enterprises can complete products through big data ( service ) Design and innovation of .
(2) For AI 、 The development of machine learning has laid the foundation . Whether it's machine learning or deep learning , Are based on massive data to calculate , And then find out the corresponding laws .
Two . The general process of data analysis
Data analysis is based on clarifying the purpose of analysis , Collect data purposefully , And use appropriate analysis methods and tools to process the data 、 Statistical classification and exploratory analysis , Finally, the valuable information in the data is extracted and the key conclusions are logically presented .
1. Demand analysis
Clarify the analysis background and purpose , Translate problem requirements into business understanding , Turn business issues into data issues .
2. Frame of thought
Divergent multi angle disassembly of requirements , Determine the direction of the analysis , Logically and clearly organize the analysis ideas .
3. Data preparation
Determine the analysis user group 、 Data dimensions and indicators , Design and develop data models .
4. Statistical analysis
use SQL、Excel Data statistics exploration , Refine key conclusions .
5. Data visualization
Design appropriate charts to visualize data conclusions .
6. Report writing
Prepare data analysis report , The content should be clear-cut 、 Clear logic .
3、 ... and . The thinking framework of data analysis
1. Top down method
If you are familiar with business , First of all, we can quickly find the central idea of the problem ; Secondly, list the analysis framework to solve the problem , Analyze problems from multiple perspectives , Then determine the analysis direction of the problem ; Last , Place the collected material under the corresponding frame .
2. Bottom up method
If you are not familiar with the business , First, collect as much material as possible from the bottom ; Secondly, build a preliminary framework according to the existing materials , Place the collected materials under each frame ; Finally, as the amount of material increases , Gradually improve the framework and add new content .
Four . Data preparation for data analysis
1. Determine the scope of statistical caliber
(1) Determine the analysis user group ;
(2) Locate the data source ;
(3) Determine the analysis dimension ;
(4) Determine the analysis indicators .
2. Design and develop data table model
(1) Have a clear understanding of the underlying data structure ;
(2) Trade off between data extraction efficiency and depth , Design database model , Try to design a single user table ;
(3) Offline data design cube ( dimension + indicators ).
3. Check and ensure data quality
(1) Check with the unified data analysis system ;
(2) Check with other similar data requirements ;
(3) Check the upper data with the lower data ;
(4) Check the integrity of data business logic .
5、 ... and . Common statistical analysis tools
Common statistical tools are Excel、SPSS、SAS、R、Matlab、Python etc. .
(1)EXCEL
Definition :
EXCLE yes Microsoft For the use of Windows and Apple Macintosh The operating system of a computer written by a spreadsheet software .
The main function :
You can draw various icons , Do ANOVA 、 Regression analysis and other basic analysis .
Application field :EXCLE Not very professional , But it is completely competent for simple data analysis in daily work .
(2)SPSS
Definition :
SPSS yes “ Statistical products and services solutions ” Software , Used for statistical analysis 、 data mining 、 Software products and related services for predictive analysis and decision support tasks .
The main function :
SPSS Necessary basic modules , Manage the entire software platform , Manage data access 、 Data processing and output , And can carry out many kinds of common basic statistical analysis ; In data processing , In addition to basic data analysis , If you also want to establish analysis process data , You need to use this module .Advanced Statistics Build more flexibility for analysis results 、 More mature models , When dealing with nested data, we can get a more accurate prediction model , You can analyze event history and duration data ; Mainly used for regression analysis .Regression Provide a large number of nonlinear modeling tools 、 Multidimensional scaling analysis to help researchers perform regression analysis . It frees data from data constraints , Conveniently divide the data into two groups , Establish a controllable model and expression to estimate the parameters of the nonlinear model , It can establish a better prediction model than simple linear regression model ;SPSS Conjoint It is a system consisting of three interrelated processes , Used for full feature joint analysis . Joint analysis enables researchers to understand consumer preferences , Or product evaluation under certain product attributes and level conditions .
Application field :
Including in economic management 、 project management 、 Project quality control and other aspects . On project management , It can be applied to the satisfaction evaluation of engineering project management 、 Statistical analysis , Especially in the statistical analysis of quality control . And the economy 、 biological 、 Many fields of medicine can do , There is too much to be specific , but spss Best at variance analysis .
(3)MATLAB
Definition :
MATLAB Is the U.S. MathWorks The company's commercial math software , For algorithm development 、 Data visualization 、 Advanced technology of data analysis and numerical calculation, computing language and interactive environment .
The main function :MATLAB It has convenient data visualization function from the date of production , To graphically represent vectors and matrices , And you can label and print graphics . High level mapping includes two-dimensional and three-dimensional visualization 、 Image processing 、 Animation and expression drawing . It can be used in scientific calculation and engineering drawing .
Application field :
MATLAB It has a wide range of applications , Including signal and image processing 、 Communications 、 control system design 、 Testing and measuring 、 Financial modeling and analysis, computational biology and many other application fields .
边栏推荐
- Build "green computing" and interpret "Intelligent Computing Center"
- 数字经济“双碳”目标下,“东数西算”数据中心为何依靠液冷散热技术节能减排?
- 机器学习/深度学习必备数学知识
- 时间序列的数据分析(二):数据趋势的计算
- 3D image classification of lung CT scan using propeller
- Binary tree
- Numpy summary
- 生命科学领域下的医药研发通过什么技术?冷冻电镜?分子模拟?IND?
- 利用or-tools来求解路径规划问题(TSP)
- Interpretation of the paper: a convolutional neural network for identifying N6 methyladenine sites in rice genome using dinucleotide one hot encoder
猜你喜欢

3D image classification of lung CT scan using propeller

单片机学习笔记8--按键和外部中断(基于百问网STM32F103系列教程)

利用or-tools来求解带容量限制的路径规划问题(CVRP)

论文解读:《Deep-4mcw2v: 基于序列的预测器用于识别大肠桿菌中的 N4- 甲基胞嘧啶(4mC)位点》

The data set needed to generate yolov3 from the existing voc207 data set, and the places that need to be modified to officially start the debugging program

NLP自然语言处理-机器学习和自然语言处理介绍(一)

Notes | Baidu flying plasma AI talent Creation Camp: data acquisition and processing (mainly CV tasks)

UE4 solves the problem that the WebBrowser cannot play H.264

Gartner调查研究:中国的数字化发展较之世界水平如何?高性能计算能否占据主导地位?

Interpretation of the paper: a convolutional neural network for identifying N6 methyladenine sites in rice genome using dinucleotide one hot encoder
随机推荐
A hundred schools of thought contend at the 2021 trusted privacy computing Summit Forum and data security industry summit
virtual function
Read and write file data
Using or tools to solve the path planning problem (TSP)
硬件知識1--原理圖和接口類型(基於百問網硬件操作大全視頻教程)
Ninja file syntax learning
VIO---Boundle Adjustment求解过程
matplotlib使用总结
Introduction and use of Ninja
《数据中心白皮书 2022》“东数西算”下数据中心高性能计算的六大趋势八大技术
单片机学习笔记6--中断系统(基于百问网STM32F103系列教程)
论文解读:《利用注意力机制提高DNA的N6-甲基腺嘌呤位点的鉴定》
单片机学习笔记5--STM32时钟系统(基于百问网STM32F103系列教程)
ARM架构与编程5--gcc与Makefile(基于百问网ARM架构与编程教程视频)
2021信息科学Top10发展态势。深度学习?卷积神经网络?
论文解读:《提高N7-甲基鸟苷(m7G)位点预测性能的迭代特征表示方法》
The use of padding.nn.bceloss
Notes | Baidu flying plasma AI talent Creation Camp: How did amazing ideas come into being?
论文解读:《开发和验证深度学习系统对黄斑裂孔的病因进行分类并预测解剖结果》
Hard disk partition of obsessive-compulsive disorder