当前位置：网站首页>Generalized linear model (logistic regression, Poisson regression)

Generalized linear model (logistic regression, Poisson regression)

2022-06-26 04:50:00 【I am a little monster】

The linear regression model is not suitable for all cases , Some results may contain metadata （ For example, positive and negative ） Or counting data , Generalized linear models can be used to interpret such data , The linear combination of independent variables is still used .

Catalog

Negative binomial regression

Logical regression

When the response variable is binary , Logistic regression is often used to model data .

The following data comes from pandas Make use of the data provided , Download here if necessary https://download.csdn.net/download/qq_57099024/79301082

import pandas as pd
d=pd.read_csv('D:/pandas Flexible use /pandas_for_everyone-master/data/acs_ny.csv')
print(d.columns)
print('@'*66)# Output special symbols to distinguish between two outputs 
print(d.head())


''' Here is the output ：
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren',
       'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers',
       'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp',
       'HeatingFuel', 'Insurance', 'Language'],
      dtype='object')
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
  Acres  FamilyIncome   FamilyType  NumBedrooms  NumChildren  NumPeople  \
0  1-10           150      Married            4            1          3   
1  1-10           180  Female Head            3            2          4   
2  1-10           280  Female Head            4            0          2   
3  1-10           330  Female Head            2            1          2   
4  1-10           330    Male Head            3            1          2   

   NumRooms         NumUnits  NumVehicles  NumWorkers   OwnRent    YearBuilt  \
0         9  Single detached            1           0  Mortgage    1950-1959   
1         6  Single detached            2           0    Rented  Before 1939   
2         8  Single detached            3           1  Mortgage    2000-2004   
3         4  Single detached            1           0    Rented    1950-1959   
4         5  Single attached            1           0  Mortgage  Before 1939   

   HouseCosts  ElectricBill FoodStamp HeatingFuel  Insurance        Language  
0        1800            90        No         Gas       2500         English  
1         850            90        No         Oil          0         English  
2        2600           260        No         Oil       6600  Other European  
3        1800           140        No         Oil          0         English  
4         860           150        No         Gas        660         Spanish  '''

The following for FamilyIncome Carry out box splitting operation ：

d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])
d['income_15w']=d['income_15w'].astype(int)

Use cut Split operation , Create a binary response variable _ I am a little monster blog -CSDN Blog

Use statsmodels

import statsmodels.formula.api as smf
model=smf.logit('income_15w~HouseCosts+NumWorkers+OwnRent+NumBedrooms+FamilyType',data=d)
results=model.fit()
print(results.summary())

Optimization terminated successfully.
         Current function value: 0.391651
         Iterations 7
                           Logit Regression Results                           
==============================================================================
Dep. Variable:             income_15w   No. Observations:                22745
Model:                          Logit   Df Residuals:                    22737
Method:                           MLE   Df Model:                            7
Date:                Sat, 05 Feb 2022   Pseudo R-squ.:                  0.2078
Time:                        08:46:18   Log-Likelihood:                -8908.1
converged:                       True   LL-Null:                       -11244.
Covariance Type:            nonrobust   LLR p-value:                     0.000
===========================================================================================
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                  -5.8081      0.120    -48.456      0.000      -6.043      -5.573
OwnRent[T.Outright]         1.8276      0.208      8.782      0.000       1.420       2.236
OwnRent[T.Rented]          -0.8763      0.101     -8.647      0.000      -1.075      -0.678
FamilyType[T.Male Head]     0.2874      0.150      1.913      0.056      -0.007       0.582
FamilyType[T.Married]       1.3877      0.088     15.781      0.000       1.215       1.560
HouseCosts                  0.0007   1.72e-05     42.453      0.000       0.001       0.001
NumWorkers                  0.5873      0.026     22.393      0.000       0.536       0.639
NumBedrooms                 0.2365      0.017     13.985      0.000       0.203       0.270
==================================================================================

Use sklearn

predictors=pd.get_dummies(d[['HouseCosts','NumWorkers','OwnRent','NumBedrooms','FamilyType']],drop_first=True)
from sklearn import linear_model
lr=linear_model.LogisticRegression()
results=lr.fit(X=predictors,y=d['income_15w'])
print(results.coef_)
print('-*-'*10)
print(results.intercept_)

[[ 5.86894916e-04  7.32489391e-01  2.86764784e-01  7.17542587e-02
  -2.13282748e+00 -1.03910262e+00  2.63647146e-01]]
-*--*--*--*--*--*--*--*--*--*-
[-4.86108187]

Poisson's return

It is often used for counting data analysis

Use statsmodels

results=smf.poisson('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d).fit()
print(results.summary())

Optimization terminated successfully.
         Current function value: nan
         Iterations 1
                          Poisson Regression Results                          
==============================================================================
Dep. Variable:            NumChildren   No. Observations:                22745
Model:                        Poisson   Df Residuals:                    22739
Method:                           MLE   Df Model:                            5
Date:                Sat, 05 Feb 2022   Pseudo R-squ.:                     nan
Time:                        09:05:28   Log-Likelihood:                    nan
converged:                       True   LL-Null:                       -30977.
Covariance Type:            nonrobust   LLR p-value:                       nan
===========================================================================================
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                      nan        nan        nan        nan         nan         nan
FamilyType[T.Male Head]        nan        nan        nan        nan         nan         nan
FamilyType[T.Married]          nan        nan        nan        nan         nan         nan
OwnRent[T.Outright]            nan        nan        nan        nan         nan         nan
OwnRent[T.Rented]              nan        nan        nan        nan         nan         nan
FamilyIncome                   nan        nan        nan        nan         nan         nan
==================================================================================

Negative binomial regression

If the assumption of Poisson regression is not ideal （ For example, the data is excessively discrete ）, Negative binomial regression can be used instead of

statsmodels Of GLM The document is listed and can be passed in GLM Many distribution families of parameters , Can be found in sm.familiese.<FAMILY>.links Find connection function under ：：
Binomial（ Binomial distribution ）
Gamma（ Gamma distribution ）
InverseGaussian（ Inverse Gaussian distribution ）
NegativeBinomial（ Negative binomial distribution ）
Poisson（ Poisson distribution ）
Tweedie Distribution

import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
model=smf.glm('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d,family=sm.families.NegativeBinomial(sm.genmod.families.links.log))
results=model.fit()
print(results.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:            NumChildren   No. Observations:                22745
Model:                            GLM   Df Residuals:                    22739
Model Family:        NegativeBinomial   Df Model:                            5
Link Function:                    log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -29749.
Date:                Sat, 05 Feb 2022   Deviance:                       20731.
Time:                        10:06:21   Pearson chi2:                 1.77e+04
No. Iterations:                     6                                         
Covariance Type:            nonrobust                                         
===========================================================================================
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                  -0.3345      0.029    -11.672      0.000      -0.391      -0.278
FamilyType[T.Male Head]    -0.0468      0.052     -0.905      0.365      -0.148       0.055
FamilyType[T.Married]       0.1529      0.029      5.200      0.000       0.095       0.211
OwnRent[T.Outright]        -1.9737      0.243     -8.113      0.000      -2.450      -1.497
OwnRent[T.Rented]           0.4164      0.030     13.754      0.000       0.357       0.476
FamilyIncome             5.398e-07   9.55e-08      5.652      0.000    3.53e-07    7.27e-07
=================================================================================

原网站

版权声明
本文为[I am a little monster]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202180510153549.html