当前位置:网站首页>Create a binary response variable using the cut sub box operation

Create a binary response variable using the cut sub box operation

2022-06-26 04:50:00 I am a little monster

import pandas as pd
d=pd.read_csv('D:/pandas Flexible use /pandas_for_everyone-master/data/acs_ny.csv')
print(d.columns)
print('@'*66)
print(d.head())
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren',
       'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers',
       'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp',
       'HeatingFuel', 'Insurance', 'Language'],
      dtype='object')
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
  Acres  FamilyIncome   FamilyType  NumBedrooms  NumChildren  NumPeople  \
0  1-10           150      Married            4            1          3   
1  1-10           180  Female Head            3            2          4   
2  1-10           280  Female Head            4            0          2   
3  1-10           330  Female Head            2            1          2   
4  1-10           330    Male Head            3            1          2   

   NumRooms         NumUnits  NumVehicles  NumWorkers   OwnRent    YearBuilt  \
0         9  Single detached            1           0  Mortgage    1950-1959   
1         6  Single detached            2           0    Rented  Before 1939   
2         8  Single detached            3           1  Mortgage    2000-2004   
3         4  Single detached            1           0    Rented    1950-1959   
4         5  Single attached            1           0  Mortgage  Before 1939   

   HouseCosts  ElectricBill FoodStamp HeatingFuel  Insurance        Language  
0        1800            90        No         Gas       2500         English  
1         850            90        No         Oil          0         English  
2        2600           260        No         Oil       6600  Other European  
3        1800           140        No         Oil          0         English  
4         860           150        No         Gas        660         Spanish  

The following for FamilyIncome Carry out box splitting operation :

# Which specifies the column to be crated , The designated revenue is in the range of 0-150000 For the 0,150000 The range to the maximum value of income is 1, label labels Use list to pass in values , You can also specify a string as a label 
d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])
print(d.info())
print(d['income_15w'].value_counts())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22745 entries, 0 to 22744
Data columns (total 19 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   Acres         22745 non-null  object  
 1   FamilyIncome  22745 non-null  int64   
 2   FamilyType    22745 non-null  object  
 3   NumBedrooms   22745 non-null  int64   
 4   NumChildren   22745 non-null  int64   
 5   NumPeople     22745 non-null  int64   
 6   NumRooms      22745 non-null  int64   
 7   NumUnits      22745 non-null  object  
 8   NumVehicles   22745 non-null  int64   
 9   NumWorkers    22745 non-null  int64   
 10  OwnRent       22745 non-null  object  
 11  YearBuilt     22745 non-null  object  
 12  HouseCosts    22745 non-null  int64   
 13  ElectricBill  22745 non-null  int64   
 14  FoodStamp     22745 non-null  object  
 15  HeatingFuel   22745 non-null  object  
 16  Insurance     22745 non-null  int64   
 17  Language      22745 non-null  object  
 18  income_15w    22745 non-null  category
dtypes: category(1), int64(10), object(8)
memory usage: 3.1+ MB
None
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
0    18294
1     4451
Name: income_15w, dtype: int64
原网站

版权声明
本文为[I am a little monster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202180510153580.html