当前位置:网站首页>Hands on data analysis data modeling and model evaluation
Hands on data analysis data modeling and model evaluation
2022-06-25 01:22:00 【includeSteven】
Data modeling and evaluation
Introduce
After data processing and Preliminary visual analysis , We can use the data to get the information we want . The first step of data analysis is modeling , After modeling, we need to evaluate whether our model is reliable .
Data modeling
The modeling library used here is sklearn, It contains many algorithms of machine learning , The corresponding model algorithm selection path can refer to the following figure :

Divide the data set
First, the data set should be divided into training set and test set , What we use here is sklearn.model_selection.train_test_split Method , Can pass jupyter Of train_test_split? View the documentation for the method .
Note that random selection is used by default for cutting data sets , It needs to be judged according to the actual situation .
Model creation
stay sklearn in , All estimators are inherited from estimator, All pass fit Method to build the model , Use predict To predict the outcome .
For classification , You can use logistic regression or random forest , Corresponding to the following two classes :
- sklearn.liner_model.LogisticRegression
- sklearn.ensemble.RandomForestClassifier
Model to predict
After building the model , Can pass predict Method to predict the model , Input eigenvalue x, The corresponding label will be given y value .
You can also use predict_proba To get the probability of each tag corresponding to the model prediction .
Evaluation of the model
Cross validation
sklearn.model_selection.cross_val_score(estimator, X_train, y_train, cv=10): Output the score of each cross validation
Confusion matrix and corresponding probability calculation
- sklearn.metrics.confusion_matrix
- sklearn.metrics.classification_report
draw ROC curve
sklearn.metrics.roc_curve, The return value is false positive rate、true positive rate and thresholds
边栏推荐
- 腾讯云WeCity解决方案
- Q1季度逆势增长的华为笔电,正引领PC进入“智慧办公”时代
- Why does Dell always refuse to push the ultra-thin commercial notebook to the extreme?
- Tencent cloud wecity solution
- 戴尔为何一直拒绝将商用本的超薄推向极致?
- After the college entrance examination, the following four situations will inevitably occur:
- 2种常见的设备稼动率OEE监测方法
- Convert MySQL query timestamp to date format
- WinXP内核驱动调试
- Super detailed description and derivation of convolution and deconvolution (deconvolution is also called transpose convolution and fractional step convolution)
猜你喜欢
随机推荐
4 ans d'expérience de travail, 5 modes de communication Multi - thread ne peuvent pas être décrits, vous osez croire?
Bi-sql create
1. 封装自己的脚手架 2.创建代码模块
天书夜读笔记——内存分页机制
Tencent has completed the comprehensive cloud launch to build the largest cloud native practice in China
Programmer: did you spend all your savings to buy a house in Shenzhen? Or return to Changsha to live a "surplus" life?
mysql查询时间戳转换成日期格式
【直播回顾】2022腾讯云未来社区城市运营方招募会暨SaaS 2.0新品发布会!
Introduction to bi-sql wildcards
Linux64Bit下安装MySQL5.6-不能修改root密码
15. several methods of thread synchronization
Properties of DOM
Why does Dell always refuse to push the ultra-thin commercial notebook to the extreme?
matlab 取整
Bi-sql like
Convert MySQL query timestamp to date format
丹麥技術大學首創將量子計算應用於能源系統潮流建模
AUTOCAD——两种延伸方式
卷积与反卷积关系超详细说明及推导(反卷积又称转置卷积、分数步长卷积)
Bi-sql top







