当前位置:网站首页>Generalized linear model (logistic regression, Poisson regression)
Generalized linear model (logistic regression, Poisson regression)
2022-06-26 04:50:00 【I am a little monster】
The linear regression model is not suitable for all cases , Some results may contain metadata ( For example, positive and negative ) Or counting data , Generalized linear models can be used to interpret such data , The linear combination of independent variables is still used .
Catalog
Logical regression
When the response variable is binary , Logistic regression is often used to model data .
The following data comes from pandas Make use of the data provided , Download here if necessary https://download.csdn.net/download/qq_57099024/79301082
import pandas as pd
d=pd.read_csv('D:/pandas Flexible use /pandas_for_everyone-master/data/acs_ny.csv')
print(d.columns)
print('@'*66)# Output special symbols to distinguish between two outputs
print(d.head())
''' Here is the output :
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren',
'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers',
'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp',
'HeatingFuel', 'Insurance', 'Language'],
dtype='object')
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Acres FamilyIncome FamilyType NumBedrooms NumChildren NumPeople \
0 1-10 150 Married 4 1 3
1 1-10 180 Female Head 3 2 4
2 1-10 280 Female Head 4 0 2
3 1-10 330 Female Head 2 1 2
4 1-10 330 Male Head 3 1 2
NumRooms NumUnits NumVehicles NumWorkers OwnRent YearBuilt \
0 9 Single detached 1 0 Mortgage 1950-1959
1 6 Single detached 2 0 Rented Before 1939
2 8 Single detached 3 1 Mortgage 2000-2004
3 4 Single detached 1 0 Rented 1950-1959
4 5 Single attached 1 0 Mortgage Before 1939
HouseCosts ElectricBill FoodStamp HeatingFuel Insurance Language
0 1800 90 No Gas 2500 English
1 850 90 No Oil 0 English
2 2600 260 No Oil 6600 Other European
3 1800 140 No Oil 0 English
4 860 150 No Gas 660 Spanish '''
The following for FamilyIncome Carry out box splitting operation :
d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])
d['income_15w']=d['income_15w'].astype(int)
Use cut Split operation , Create a binary response variable _ I am a little monster blog -CSDN Blog
Use statsmodels
import statsmodels.formula.api as smf
model=smf.logit('income_15w~HouseCosts+NumWorkers+OwnRent+NumBedrooms+FamilyType',data=d)
results=model.fit()
print(results.summary())
Optimization terminated successfully. Current function value: 0.391651 Iterations 7 Logit Regression Results ============================================================================== Dep. Variable: income_15w No. Observations: 22745 Model: Logit Df Residuals: 22737 Method: MLE Df Model: 7 Date: Sat, 05 Feb 2022 Pseudo R-squ.: 0.2078 Time: 08:46:18 Log-Likelihood: -8908.1 converged: True LL-Null: -11244. Covariance Type: nonrobust LLR p-value: 0.000 =========================================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------------------- Intercept -5.8081 0.120 -48.456 0.000 -6.043 -5.573 OwnRent[T.Outright] 1.8276 0.208 8.782 0.000 1.420 2.236 OwnRent[T.Rented] -0.8763 0.101 -8.647 0.000 -1.075 -0.678 FamilyType[T.Male Head] 0.2874 0.150 1.913 0.056 -0.007 0.582 FamilyType[T.Married] 1.3877 0.088 15.781 0.000 1.215 1.560 HouseCosts 0.0007 1.72e-05 42.453 0.000 0.001 0.001 NumWorkers 0.5873 0.026 22.393 0.000 0.536 0.639 NumBedrooms 0.2365 0.017 13.985 0.000 0.203 0.270 ==================================================================================
Use sklearn
predictors=pd.get_dummies(d[['HouseCosts','NumWorkers','OwnRent','NumBedrooms','FamilyType']],drop_first=True)
from sklearn import linear_model
lr=linear_model.LogisticRegression()
results=lr.fit(X=predictors,y=d['income_15w'])
print(results.coef_)
print('-*-'*10)
print(results.intercept_)
[[ 5.86894916e-04 7.32489391e-01 2.86764784e-01 7.17542587e-02 -2.13282748e+00 -1.03910262e+00 2.63647146e-01]] -*--*--*--*--*--*--*--*--*--*- [-4.86108187]
Poisson's return
It is often used for counting data analysis
Use statsmodels
results=smf.poisson('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d).fit()
print(results.summary())
Optimization terminated successfully. Current function value: nan Iterations 1 Poisson Regression Results ============================================================================== Dep. Variable: NumChildren No. Observations: 22745 Model: Poisson Df Residuals: 22739 Method: MLE Df Model: 5 Date: Sat, 05 Feb 2022 Pseudo R-squ.: nan Time: 09:05:28 Log-Likelihood: nan converged: True LL-Null: -30977. Covariance Type: nonrobust LLR p-value: nan =========================================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------------------- Intercept nan nan nan nan nan nan FamilyType[T.Male Head] nan nan nan nan nan nan FamilyType[T.Married] nan nan nan nan nan nan OwnRent[T.Outright] nan nan nan nan nan nan OwnRent[T.Rented] nan nan nan nan nan nan FamilyIncome nan nan nan nan nan nan ==================================================================================
Negative binomial regression
If the assumption of Poisson regression is not ideal ( For example, the data is excessively discrete ), Negative binomial regression can be used instead of
statsmodels Of GLM The document is listed and can be passed in GLM Many distribution families of parameters , Can be found in sm.familiese.<FAMILY>.links Find connection function under ::
Binomial( Binomial distribution )
Gamma( Gamma distribution )
InverseGaussian( Inverse Gaussian distribution )
NegativeBinomial( Negative binomial distribution )
Poisson( Poisson distribution )
Tweedie Distribution
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
model=smf.glm('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d,family=sm.families.NegativeBinomial(sm.genmod.families.links.log))
results=model.fit()
print(results.summary())
Generalized Linear Model Regression Results ============================================================================== Dep. Variable: NumChildren No. Observations: 22745 Model: GLM Df Residuals: 22739 Model Family: NegativeBinomial Df Model: 5 Link Function: log Scale: 1.0000 Method: IRLS Log-Likelihood: -29749. Date: Sat, 05 Feb 2022 Deviance: 20731. Time: 10:06:21 Pearson chi2: 1.77e+04 No. Iterations: 6 Covariance Type: nonrobust =========================================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------------------- Intercept -0.3345 0.029 -11.672 0.000 -0.391 -0.278 FamilyType[T.Male Head] -0.0468 0.052 -0.905 0.365 -0.148 0.055 FamilyType[T.Married] 0.1529 0.029 5.200 0.000 0.095 0.211 OwnRent[T.Outright] -1.9737 0.243 -8.113 0.000 -2.450 -1.497 OwnRent[T.Rented] 0.4164 0.030 13.754 0.000 0.357 0.476 FamilyIncome 5.398e-07 9.55e-08 5.652 0.000 3.53e-07 7.27e-07 =================================================================================
边栏推荐
- Database design (3): database maintenance and optimization
- numpy 通用函数
- Selection of programming language
- Sixtool- source code of multi-functional and all in one generation hanging assistant
- Anti withdrawal test record
- [H5 development] 03- take you hand in hand to improve H5 development - single submission vs batch submission with a common interface
- 做软件测试学历重要还是能力重要
- Multipass中文文档-设置驱动
- Using Matplotlib to add an external image at the canvas level
- 1.21 learning summary
猜你喜欢
文件上传与安全狗
[H5 development] 03- take you hand in hand to improve H5 development - single submission vs batch submission with a common interface
UWB超高精度定位系统原理图
PowerShell runtime system IO exceptions
记录一次循环引用的问题
A new paradigm for large model application: unified feature representation optimization (UFO)
Dameng database backup and restore
Thinkphp6 using kindeditor
How to carry out word-of-mouth marketing for enterprises' products and services? Can word of mouth marketing be done on behalf of others?
5. <tag-栈和常规问题>补充: lt.946. 验证栈序列(同剑指 Offer 31. 栈的压入、弹出序列)
随机推荐
企业的产品服务怎么进行口碑营销?口碑营销可以找人代做吗?
Multipass Chinese document - use packer to package multipass image
Multipass中文文档-远程使用Multipass
PHP small factory moves bricks for three years - interview series - my programming life
"Eight hundred"
Hash problem
Basic query
1.11 learning summary
Nabicat connection: local MySQL & cloud service MySQL and error reporting
pycharm 导包错误没有警告
[H5 development] 02 take you to develop H5 list page ~ including query, reset and submission functions
The select option in laravel admin contains a large amount of data
digital image processing
How can the intelligent transformation path of manufacturing enterprises be broken due to talent shortage and high cost?
Thymeleaf data echo, single selection backfill, drop-down backfill, time frame backfill
Database design (I)
PHP get mobile number operator
Multipass中文文档-移除实例
UWB超高精度定位系统架构图
Laravel pay payment access process