当前位置:网站首页>Hands on data analysis data modeling and model evaluation
Hands on data analysis data modeling and model evaluation
2022-06-25 01:22:00 【includeSteven】
Data modeling and evaluation
Introduce
After data processing and Preliminary visual analysis , We can use the data to get the information we want . The first step of data analysis is modeling , After modeling, we need to evaluate whether our model is reliable .
Data modeling
The modeling library used here is sklearn, It contains many algorithms of machine learning , The corresponding model algorithm selection path can refer to the following figure :
Divide the data set
First, the data set should be divided into training set and test set , What we use here is sklearn.model_selection.train_test_split Method , Can pass jupyter Of train_test_split?
View the documentation for the method .
Note that random selection is used by default for cutting data sets , It needs to be judged according to the actual situation .
Model creation
stay sklearn in , All estimators are inherited from estimator, All pass fit Method to build the model , Use predict To predict the outcome .
For classification , You can use logistic regression or random forest , Corresponding to the following two classes :
- sklearn.liner_model.LogisticRegression
- sklearn.ensemble.RandomForestClassifier
Model to predict
After building the model , Can pass predict Method to predict the model , Input eigenvalue x, The corresponding label will be given y value .
You can also use predict_proba To get the probability of each tag corresponding to the model prediction .
Evaluation of the model
Cross validation
sklearn.model_selection.cross_val_score(estimator, X_train, y_train, cv=10): Output the score of each cross validation
Confusion matrix and corresponding probability calculation
- sklearn.metrics.confusion_matrix
- sklearn.metrics.classification_report
draw ROC curve
sklearn.metrics.roc_curve, The return value is false positive rate、true positive rate and thresholds
边栏推荐
猜你喜欢
4年工作經驗,多線程間的5種通信方式都說不出來,你敢信?
TC对象结构和简称
Cobalt strike installation tutorial
Bi-sql top
天书夜读笔记——内存分页机制
How to store dataframe data in pandas into MySQL
Linux64Bit下安装MySQL5.6-不能修改root密码
论文翻译 | RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds
4 ans d'expérience de travail, 5 modes de communication Multi - thread ne peuvent pas être décrits, vous osez croire?
数组中关于sizeof()和strlen
随机推荐
Redis and jedis
TC对象结构和简称
Reading notes at night -- deep into virtual function
15.线程同步的几种方法
C语言边界计算和不对称边界
Expectation and variance
Bi-sql Union
MySQL gets the primary key and table structure of the table
Bi-sql select into
用手机在同花顺上开户靠谱吗?这样炒股有没有什么安全隐患
Tencent moved!
Contentresolver, get the SMS content
Library management system code source code (php+css+js+mysql) complete code source code
php easywechat 和 小程序 实现 长久订阅消息推送
国内炒股开户正规安全的具体名单
sql 聚合函数对 null 的处理[通俗易懂]
Powerbi - for you who are learning
卷积与反卷积关系超详细说明及推导(反卷积又称转置卷积、分数步长卷积)
1. 封装自己的脚手架 2.创建代码模块
Super detailed description and derivation of convolution and deconvolution (deconvolution is also called transpose convolution and fractional step convolution)