当前位置:网站首页>How to play a data mining game entry Edition
How to play a data mining game entry Edition
2022-07-25 05:57:00 【Datawhale】
Datawhale dried food
contributor : Herding bear , Luoxiutao , Si Yuxin , Pan Shuyu etc.
This is a simple competition tutorial , Our goal is to help students step out AI The first step in training Masters . There will be a lot to learn in data mining , It is suggested that students who are getting started can temporarily understand the principles of various codes without worrying , Get through the code first , Then look at the knowledge points involved in the code to query relevant materials for learning , This will make your study more targeted , It is also easy to find the fun of learning . A journey , Begins with a single step , From here , Open your AI A journey of study !
—— contributor : Herding bear 、 Luoxiutao

One 、 Preparation steps
1.1 Platform registration and Competition Registration
Links to events :
https://challenge.xfyun.cn/topic/info?type=diabetes&ch=ds22-dw-gzh02register ( Remember to fill in your personal information )


Click to register , Show successful enrollment


1.2 Data download
Data acquisition
Download data on the official website : Download data and real name authentication .
Detailed operations can be viewed :https://xj15uxcopw.feishu.cn/docx/doxcn11gwo7cEuAXWhCrDld4InbPlease put the data file and code file in the same folder , Ensure normal operation
1.3 Reference material
python Please refer to :
Mac equipment :Mac Installation on Anaconda Most comprehensive tutorial https://zhuanlan.zhihu.com/p/350828057
Windows equipment :Anaconda Super detailed installation tutorial
https://blog.csdn.net/fan18317517352/article/details/123035625
Two 、 Practical ideas
This competition is a data mining competition , Players need to build models through training set data , Then predict the validation set data , Submit the prediction results .
The task of this topic is to build a model , The model can predict whether the patient has diabetes according to the patient's test data . This type of task is a typical binary classification problem ( Have diabetes / No diabetes ), The prediction output of the model is 0 or 1 ( Have diabetes :1, No diabetes :0)
Machine learning , About the classification task, we usually think of logical regression 、 Decision tree and other algorithms , In this Baseline in , We try to use decision tree to build our model . When we solve machine learning problems , Generally, the following process will be followed :

2.1 Code implementation
The following code , Please be there. jupyter notbook or python In the compiler environment
# Install dependent Libraries If it is windows System ,cmd Input in the command box pip install , Refer to the above environment configuration
#!pip install sklearn
#!pip install pandas
#---------------------------------------------------
# Import library
#---------------- Data exploration ----------------
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
# Data preprocessing
data1=pd.read_csv(' Game training set .csv',encoding='gbk')
data2=pd.read_csv(' Competition test set .csv',encoding='gbk')
#label Marked as -1
data2[' Signs of diabetes ']=-1
# The training set and the testing machine are merged
data=pd.concat([data1,data2],axis=0,ignore_index=True)
# Fill the missing values in the diastolic blood pressure characteristics with -1
data[' diastolic pressure ']=data[' diastolic pressure '].fillna(-1)
#---------------- Feature Engineering ----------------
"""
Convert the year of birth into age
"""
data[' Age ']=2022-data[' Year of birth '] # Change to age
"""
The normal value of the body mass index for adults is 18.5-24 Between
lower than 18.5 It's a low BMI
stay 24-27 Between them is overweight
27 The above consideration is obesity
higher than 32 You are very fat .
"""
def BMI(a):
if a<18.5:
return 0
elif 18.5<=a<=24:
return 1
elif 24<a<=27:
return 2
elif 27<a<=32:
return 3
else:
return 4
data['BMI']=data[' Body mass index '].apply(BMI)
# Family history of diabetes
"""
No record
One uncle or aunt has diabetes / One uncle or aunt has diabetes
One parent has diabetes
"""
def FHOD(a):
if a==' No record ':
return 0
elif a==' One uncle or aunt has diabetes ' or a==' One uncle or aunt has diabetes ':
return 1
else:
return 2
data[' Family history of diabetes ']=data[' Family history of diabetes '].apply(FHOD)
"""
The diastolic pressure range is 60-90
"""
def DBP(a):
if 0<=a<60:
return 0
elif 60<=a<=90:
return 1
elif a>90:
return 2
else:
return a
data['DBP']=data[' diastolic pressure '].apply(DBP)
#------------------------------------
# The processed feature engineering is divided into training set and test set , The training set is used to train the model , The test set is used to evaluate the accuracy of the model
# There is no relationship between the number and whether the patient has diabetes , Irrelevant features shall be deleted
train=data[data[' Signs of diabetes '] !=-1]
test=data[data[' Signs of diabetes '] ==-1]
train_label=train[' Signs of diabetes ']
train=train.drop([' Number ',' Signs of diabetes ',' Year of birth '],axis=1)
test=test.drop([' Number ',' Signs of diabetes ',' Year of birth '],axis=1)
#---------------- model training ----------------
model = DecisionTreeClassifier()
model.fit(train, train_label)
y_pre=model.predict(test)
y_pre
#---------------- Results output ----------------
result=pd.read_csv(' Submit sample .csv')
result['label']=y_pre
result.to_csv('result-de.csv',index=False)2.2 Results submitted
Submit at the submission result , Submit Predicted results .csv( Program generated CSV file ), Check your score ranking




Sorting is not easy to , spot Fabulous Three even ↓
边栏推荐
- MATLAB作图实例:5:双轴图
- (牛客多校二)J-Link with Arithmetic Progression(最小二乘法/三分)
- 10. Rendering Basics
- Matlab drawing example: 5: Biaxial graph
- (2022 Niuke multi School II) l-link with level editor I (dynamic planning)
- Ffmpeg notes (I) fundamentals of audio and video
- Unity accesses chartandgraph chart plug-in
- Basset: learning the regulatory code of the accessible genome with deep convolutional neural network
- Ceres solver version 1.14 and eigen3.2.9
- Blocking Queue Analysis
猜你喜欢

Mechanism and principle of multihead attention and masked attention

Softing pngate series gateway: integrate PROFIBUS bus into PROFINET network

剑指 Offer 54. 二叉搜索树的第k大节点

Baidu, Alibaba, Tencent, who fell first?

Linear algebra (3)

Sword finger offer 45. arrange the array into the smallest number

Working principle and precautions of bubble water level gauge
![(16)[系统调用]追踪系统调用(3环)](/img/b0/011351361135fd9f8e2d0d31749f73.png)
(16)[系统调用]追踪系统调用(3环)

ECS is exclusive to old users, and the new purchase of the remaining 10 instances is as low as 3.6% off

HTB-Beep
随机推荐
(16) [system call] track system call (3 rings)
剑指 Offer 05. 替换空格
Idea commonly used 10 shortcut keys
(2022 Niuke multi school) D-Link with game glitch (SPFA)
QT qtextedit setting qscrollbar style sheet does not take effect solution
PostgreSQL learning 04 PG_ hint_ Plan installation and use, SQL optimization knowledge
Get URL of [url reference]? For the following parameters, there are two ways to get the value of the corresponding parameter name and convert the full quantity to the object structure
(2022 Niuke multi School II) l-link with level editor I (dynamic planning)
计算BDP值和wnd值
ERA5数据集说明
Softing pnGate系列网关:将PROFIBUS总线集成到PROFINET网络
uniapp手机端uView的u-collapse组件高度init
Leetcode/ number of 1 in the first n digit binary
Amazoncaptcha 95%成功率绕过亚马逊IP验证码
Linear algebra (3)
Brief introduction of acoustic filter Market
剑指 Offer 54. 二叉搜索树的第k大节点
ROI pooling and ROI align
[daily practice] day (14)
Sword finger offer 45. arrange the array into the smallest number