当前位置:网站首页>Neural network method -- Boston house price (regression problem)
Neural network method -- Boston house price (regression problem)
2022-07-25 09:28:00 【Bubble Yi】
One 、 The return question —— Linear regression and softmax Return to
Linear regression — It refers to a kind of method to model the relationship between one or more independent variables . In the field of natural science and Social Science , Regression often represents the relationship between input and output .
Two 、 Boston house price ( The return question )
Problem description :
The dataset is a regression problem . The number of observations for each class is equal , share 506 Observations ,13 Input variables and 1 Output variables .
Each data contains detailed information about the house and its surroundings . Including the urban crime rate , Nitric oxide concentration , The average number of rooms in the house , The weighted distance to the central area and the average house price from the house, etc .
- CRIM: Crime rate per capita in cities and towns .
- ZN: More than 25000 sq.ft. The proportion of .
- INDUS: The proportion of Urban Non retail commercial land .
- CHAS: Charles River variables ( If the boundary is a river , Then for 1; Otherwise 0).
- NOX: Nitric oxide concentration .
- RM: The average number of rooms in the house .
- AGE:1940 Proportion of self use houses built before .
- DIS: Weighted distance to Boston's five central areas .
- RAD: Proximity index of radial road .
- TAX: Every time 10000 The full value property tax rate of US dollars .
- PTRATIO: The proportion of teachers and students in the city .
- B:1000(Bk-0.63)^ 2, among Bk The proportion of black people in the town .
- LSTAT: The proportion of the population in the lower ranks .
- MEDV: The average price of a house , In thousands of dollars .
1. First read in the data , The code is as follows :
#1. Data read in
from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
data=pd.read_csv('C:/Users/bby/ machine learning /boston_house_prices.csv')# Read in file
data.head()2. Divide the data into test data set and training data set
from sklearn.model_selection import train_test_split
X=data.drop('MEDV',axis=1) # Generate feature set
y=data['MEDV'] # Generate labels Set
X=X.values # Convert to array
y=y.values # Convert to array
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=33,test_size=0.25)3. Input the data with a wide range of values into the neural network , There's a problem . The network may automatically adapt to this data with different value ranges , But learning is bound to become more difficult . So standardize the data .
from sklearn.preprocessing import StandardScaler
ss_X=StandardScaler()
scaler_X=ss_X.fit(X_train)
X_train=scaler_X.transform(X_train)
X_test=scaler_X.transform(X_test)4. Build a model , Choose the appropriate optimization method , The number of hidden layers can be adjusted .
from sklearn.neural_network import MLPRegressor# Full connection
model = MLPRegressor(solver='lbfgs', hidden_layer_sizes=(15,15), random_state=1)
model.fit(X_train, y_train)5. Model to evaluate , The most commonly used loss function for regression problems is the mean square error MSE(mean aquared error) The code is as follows :
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
print(' Training set regression evaluation index :')
model_score1=model.score(X_train,y_train)
print('The accuracy of train data is',model_score1) # Goodness of fit value
print(' Test set regression evaluation index :')
model_score2=model.score(X_test,y_test)
print('The accuracy of test data is',model_score2)
y_test_predict=model.predict(X_test)
mse=mean_squared_error(y_test,y_test_predict) # Mean square error
print('The value of mean_squared_error:',mse)
mae=mean_absolute_error(y_test,y_test_predict) # Mean absolute error
print('The value of mean_absolute_error:',mae)Try super big data .
边栏推荐
- @4-1 CCF 2020-06-1 线性分类器
- 通过robocopy对文件/夹进行复制
- In mysql, update and select are used together
- ~1 ccf 2022-06-2 寻宝!大冒险!
- Unable to start debugging on the web server, the web server failed to find the requested resource
- 深入解读C语言随机数函数和如何实现随机数
- 分布式一致性协议之Raft
- Jspdf generates PDF files. There is a problem of incomplete files. Files are downloaded in the background, but not in the foreground
- Publish Yum private server using nexus3 (offline intranet)
- Go基础2
猜你喜欢

分享一个避免递归的部门设计方法

Leetcode组合总和+剪枝

Publish Yum private server using nexus3 (offline intranet)

uni-app如何获取位置信息(经纬度)

数据查询语言(DQL)

Jspdf generates PDF files. There is a problem of incomplete files. Files are downloaded in the background, but not in the foreground

无法再web服务器上启动调试,web服务器未能找到请求资源

@3-2 CCF 2020-12-2 期末预测之最佳阈值

@4-1 CCF 2020-06-1 线性分类器
![[GKCTF 2021]easynode](/img/f0/1daf6f83fea66fdefd55608cbddac6.png)
[GKCTF 2021]easynode
随机推荐
redis的五种数据结构原理分析
idea 热部署
What is the difference between mongodb and redis
In mysql, update and select are used together
通过robocopy对文件/夹进行复制
ActiveMQ -- message retry mechanism
Go foundation 2
『怎么用』代理模式
matplotlib数据可视化三分钟入门,半小时入魔?
『怎么用』装饰者模式
什么是单机、集群与分布式?
Go foundation 4
sqli-labs Basic Challenges Less11-22
『怎么用』观察者模式
~3 ccf 2022-03-2 出行计划
Leetcode组合总和+剪枝
Understand the execution process of try, catch and finally (including return) (the most detailed analysis of the whole network)
『每日一问』volatile干嘛的
SSM高级整合
Thick willow dustpan, thin willow bucket, who hates reptile man? Asynchronous synergism, half a second to strip away a novel