当前位置:网站首页>Generalized linear model (logistic regression, Poisson regression)
Generalized linear model (logistic regression, Poisson regression)
2022-06-26 04:50:00 【I am a little monster】
The linear regression model is not suitable for all cases , Some results may contain metadata ( For example, positive and negative ) Or counting data , Generalized linear models can be used to interpret such data , The linear combination of independent variables is still used .
Catalog
Logical regression
When the response variable is binary , Logistic regression is often used to model data .
The following data comes from pandas Make use of the data provided , Download here if necessary https://download.csdn.net/download/qq_57099024/79301082
import pandas as pd
d=pd.read_csv('D:/pandas Flexible use /pandas_for_everyone-master/data/acs_ny.csv')
print(d.columns)
print('@'*66)# Output special symbols to distinguish between two outputs
print(d.head())
''' Here is the output :
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren',
'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers',
'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp',
'HeatingFuel', 'Insurance', 'Language'],
dtype='object')
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Acres FamilyIncome FamilyType NumBedrooms NumChildren NumPeople \
0 1-10 150 Married 4 1 3
1 1-10 180 Female Head 3 2 4
2 1-10 280 Female Head 4 0 2
3 1-10 330 Female Head 2 1 2
4 1-10 330 Male Head 3 1 2
NumRooms NumUnits NumVehicles NumWorkers OwnRent YearBuilt \
0 9 Single detached 1 0 Mortgage 1950-1959
1 6 Single detached 2 0 Rented Before 1939
2 8 Single detached 3 1 Mortgage 2000-2004
3 4 Single detached 1 0 Rented 1950-1959
4 5 Single attached 1 0 Mortgage Before 1939
HouseCosts ElectricBill FoodStamp HeatingFuel Insurance Language
0 1800 90 No Gas 2500 English
1 850 90 No Oil 0 English
2 2600 260 No Oil 6600 Other European
3 1800 140 No Oil 0 English
4 860 150 No Gas 660 Spanish '''The following for FamilyIncome Carry out box splitting operation :
d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])
d['income_15w']=d['income_15w'].astype(int)Use cut Split operation , Create a binary response variable _ I am a little monster blog -CSDN Blog
Use statsmodels
import statsmodels.formula.api as smf
model=smf.logit('income_15w~HouseCosts+NumWorkers+OwnRent+NumBedrooms+FamilyType',data=d)
results=model.fit()
print(results.summary())Optimization terminated successfully.
Current function value: 0.391651
Iterations 7
Logit Regression Results
==============================================================================
Dep. Variable: income_15w No. Observations: 22745
Model: Logit Df Residuals: 22737
Method: MLE Df Model: 7
Date: Sat, 05 Feb 2022 Pseudo R-squ.: 0.2078
Time: 08:46:18 Log-Likelihood: -8908.1
converged: True LL-Null: -11244.
Covariance Type: nonrobust LLR p-value: 0.000
===========================================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept -5.8081 0.120 -48.456 0.000 -6.043 -5.573
OwnRent[T.Outright] 1.8276 0.208 8.782 0.000 1.420 2.236
OwnRent[T.Rented] -0.8763 0.101 -8.647 0.000 -1.075 -0.678
FamilyType[T.Male Head] 0.2874 0.150 1.913 0.056 -0.007 0.582
FamilyType[T.Married] 1.3877 0.088 15.781 0.000 1.215 1.560
HouseCosts 0.0007 1.72e-05 42.453 0.000 0.001 0.001
NumWorkers 0.5873 0.026 22.393 0.000 0.536 0.639
NumBedrooms 0.2365 0.017 13.985 0.000 0.203 0.270
==================================================================================
Use sklearn
predictors=pd.get_dummies(d[['HouseCosts','NumWorkers','OwnRent','NumBedrooms','FamilyType']],drop_first=True)
from sklearn import linear_model
lr=linear_model.LogisticRegression()
results=lr.fit(X=predictors,y=d['income_15w'])
print(results.coef_)
print('-*-'*10)
print(results.intercept_)[[ 5.86894916e-04 7.32489391e-01 2.86764784e-01 7.17542587e-02 -2.13282748e+00 -1.03910262e+00 2.63647146e-01]] -*--*--*--*--*--*--*--*--*--*- [-4.86108187]
Poisson's return
It is often used for counting data analysis
Use statsmodels
results=smf.poisson('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d).fit()
print(results.summary())Optimization terminated successfully.
Current function value: nan
Iterations 1
Poisson Regression Results
==============================================================================
Dep. Variable: NumChildren No. Observations: 22745
Model: Poisson Df Residuals: 22739
Method: MLE Df Model: 5
Date: Sat, 05 Feb 2022 Pseudo R-squ.: nan
Time: 09:05:28 Log-Likelihood: nan
converged: True LL-Null: -30977.
Covariance Type: nonrobust LLR p-value: nan
===========================================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept nan nan nan nan nan nan
FamilyType[T.Male Head] nan nan nan nan nan nan
FamilyType[T.Married] nan nan nan nan nan nan
OwnRent[T.Outright] nan nan nan nan nan nan
OwnRent[T.Rented] nan nan nan nan nan nan
FamilyIncome nan nan nan nan nan nan
==================================================================================Negative binomial regression
If the assumption of Poisson regression is not ideal ( For example, the data is excessively discrete ), Negative binomial regression can be used instead of
statsmodels Of GLM The document is listed and can be passed in GLM Many distribution families of parameters , Can be found in sm.familiese.<FAMILY>.links Find connection function under ::
Binomial( Binomial distribution )
Gamma( Gamma distribution )
InverseGaussian( Inverse Gaussian distribution )
NegativeBinomial( Negative binomial distribution )
Poisson( Poisson distribution )
Tweedie Distribution
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
model=smf.glm('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d,family=sm.families.NegativeBinomial(sm.genmod.families.links.log))
results=model.fit()
print(results.summary()) Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: NumChildren No. Observations: 22745
Model: GLM Df Residuals: 22739
Model Family: NegativeBinomial Df Model: 5
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -29749.
Date: Sat, 05 Feb 2022 Deviance: 20731.
Time: 10:06:21 Pearson chi2: 1.77e+04
No. Iterations: 6
Covariance Type: nonrobust
===========================================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept -0.3345 0.029 -11.672 0.000 -0.391 -0.278
FamilyType[T.Male Head] -0.0468 0.052 -0.905 0.365 -0.148 0.055
FamilyType[T.Married] 0.1529 0.029 5.200 0.000 0.095 0.211
OwnRent[T.Outright] -1.9737 0.243 -8.113 0.000 -2.450 -1.497
OwnRent[T.Rented] 0.4164 0.030 13.754 0.000 0.357 0.476
FamilyIncome 5.398e-07 9.55e-08 5.652 0.000 3.53e-07 7.27e-07
=================================================================================
边栏推荐
猜你喜欢

MySql如何删除所有多余的重复数据

Dbeaver installation and configuration of offline driver

1.20 learning summary

UWB超高精度定位系统原理图

图解OneFlow的学习率调整策略

Thinkphp6 implements a simple lottery system

PSIM software learning ---08 call of C program block

Statsmodels Library -- linear regression model

How to use the configured slave data source for the scheduled task configuration class scheduleconfig

Thinkphp6 using kindeditor
随机推荐
问题随记 —— pip 换源
Multipass中文文档-提高挂载性能
Introduction to markdown grammar
Create alicloud test instances
PHP get mobile number operator
Large numbers (C language)
DBeaver 安装及配置离线驱动
1.24 learning summary
YOLOV5超参数设置与数据增强解析
Multipass Chinese document - use packer to package multipass image
Thinkphp6 implements a simple lottery system
2020-12-18
微信小程序保存图片的方法
0622 horse palm fell 9%
numpy 数据输入输出
Laravel pay payment access process
文件上传与安全狗
22.2.8
UWB超高精度定位系统原理图
Numpy random number