当前位置:网站首页>Pandas vs. SQL 1_ nanyangjx
Pandas vs. SQL 1_ nanyangjx
2022-06-26 08:40:00 【Algorithmic Pilgrim】
The present python It is widely used in data processing and quantitative analysis data mining , This article will focus on python in pandas Under bag DataFrame The use of this data type is described in detail , And its equivalent in data processing SQL The contrast in , Don't talk much , On and off !!!
https://blog.csdn.net/weixin_39818658/article/details/79944925
keyword :pandas、DataFrame、SQL
1、 About filtering data table columns
Yes DataFrame( That is, a two-dimensional data table )data:
Use
data=DataFrame(np.arange(9).reshape(3,3),columns=[‘a’,‘b’,‘c’])
Statement to generate the DataFrame
data :
a b c
0 0 1 2
1 3 4 5
2 6 7 8
If we only take data Of a Column Use a slicing operation directly ( notes : The selection slice operation on the column is a two-layer bracket ):
data[[‘a’]]
--------->
0 0
1 3
2 6
Corresponding SQL Statement for :
select a from data
If we want to take a and b The two fields are also slicing operations :
data[[‘a’,‘b’]]
------>
a b
0 0 1
1 3 4
2 6 7
Corresponding SQL Statement for :
select a,b from data
First, let's create two here DataFrame:
Basic information data sheet of the company
companydata=DataFrame({'ID':['aa','bb','cc'],
'name':[' company A',' company B',' company C'],
'establish_year':['2001','2002','2003'],
'city':['ShangHai','BeiJing','ShenZhen']})
----->
ID city establish_year name
0 aa ShangHai 2001 company A
1 bb BeiJing 2002 company B
2 cc ShenZhen 2003 company C
List of products issued by the company :
productdata=DataFrame({'ID':['aa','aa','bb','bb','cc'],
'productname':['stock_A','bond_A','stock_B','bond_B','stock_C']
})
--------->
ID productname
0 aa stock_A
1 aa bond_A
2 bb stock_B
3 bb bond_B
4 cc stock_C
2、 Index of data 、 selection 、 Filter
(1) Single condition screening
First select ’ company A’ All the information about , The operation is as follows :
companydata[companydata[‘name’]==‘ company A’]
---------------------------------------------------------------------------->
ID city establish_year name
0 aa ShangHai 2001 company A
amount to SQL:
select * from companydata where name=’ company A’
(2) Multi criteria screening ( notes : The multi criteria filter criteria should be written in brackets )
selection ’ company B’ also ’city’ by ’BeiJing’, The operation is as follows :
companydata[(companydata[‘name’]‘ company B’)&(companydata[‘city’]‘BeiJing’)]
-------->
ID city establish_year name
1 bb BeiJing 2002 company B
amount to SQL:
select * from companydata where name=’ company A’ and city=’BeiJing’
selection ’ID’ by ’aa’ perhaps ’cc’ All the information about , The operation is as follows :
companydata[(companydata[‘ID’]‘aa’)|(companydata[‘ID’]‘cc’)]
------>
ID city establish_year name
0 aa ShangHai 2001 company A
2 cc ShenZhen 2003 company C
amount to SQL:
select * from companydata where ID=’aa’ or ID=’cc’
perhaps
companydata[companydata[‘ID’].isin([‘aa’,‘cc’])]
------->
ID city establish_year name
0 aa ShangHai 2001 company A
2 cc ShenZhen 2003 company C
amount to SQL:
select * from companydata where ID in (‘aa’,’cc’)
(3) Sometimes we don't need all the fields after filtering, but only one or several fields ,
for example :
View company A also ’ID’ by ’aa’ Of ’ Year of establishment ’ and ’ city ’
We need to apply it to DataFrame An index method of ix, The operation is as follows :
companydata.ix[(companydata[‘name’]‘ company A’)&(companydata[‘ID’]‘aa’),[‘city’,‘establish_year’]]
---------->
city establish_year
0 ShangHai 2001
amount to SQL:
select city,establish_year from companydata where ID=’aa’ and name=’ company A’
3、 Label according to different data , That is to say SQL Medium case when … then … operation
for example : When the city where the company is located ’city’ ,’ShangHai’ When is ’SH’,’BeiJing’ When is ’BJ’,’ShenZhen’ When is ’SZ’, The operation is as follows :
fuc=lambda x:'SH' if x=='ShangHai' else 'BJ' if x=='BeiJing' else 'SZ' if x=='ShenZhen' else np.NaN
companydata['citylabel']=companydata['city'].apply(fuc)
print(companydata)
( I can only think of such a way of writing about this , If there is a clever way , Please grant me your advice )
------->
ID city establish_year name citylabel
0 aa ShangHai 2001 company A SH
1 bb BeiJing 2002 company B BJ
2 cc ShenZhen 2003 company C SZ
amount to SQL:
select
ID,
city,
establish_year,
name,
case ‘city’ when ‘'ShangHai' ’ then ‘SH’
when 'BeiJing' then ‘BJ’
when ‘'ShenZhen' ’ then ‘SZ’ else null
end as citylabel
from companydata
4、 Aggregation of data
example :
surface productdata Yes ID Grouping , And calculate each ID There are several financial products
productdata[productname].groupby(productdata[‘ID’]).count()
------>
ID
aa 2
bb 2
cc 1
Name: ID, dtype: int64
amount to SQL:
select count(productname) from productdata group by ID
productdata[productname].groupby([productdata[‘ID’],productdata[‘productname’]]).count()
----------->
ID productname
aa bond_A 1
stock_A 1
bb bond_B 1
stock_B 1
cc stock_C 1
Name: ID, dtype: int64
amount to SQL:
select count(productname) from productdata group by ID,‘productname’
5、 Multi table association operation
In general, data processing is not a single table operation , Generally, multiple tables are associated .
example :
Make statistics on what financial products each company has , The operation is as follows :
pd.merge(companydata,productdata,left_on=‘ID’,right_on=‘ID’,how=‘left’)[[‘ID’,‘name’,‘productname’]]
------>
ID name productname
0 aa company A stock_A
1 aa company A bond_A
2 bb company B stock_B
3 bb company B bond_B
4 cc company C stock_C
amount to SQL:
Select A.ID,A.name,B.productname from companydata A left join productdata on A.ID=B.ID
Here is a brief description of the method merge Parameters of
(1) The first two parameters that must be passed in are two DataFrame , I won't elaborate here
(2)left_on Is the left table as the associated foreign key
(3)right_on Is the right table as the associated foreign key
(4)how Is the way of association , There were left\right\inner\outer, Corresponding to SQL Medium left\right\inner\full Four kinds of join The way
边栏推荐
- [postgraduate entrance examination] group planning exercises: memory
- 51 MCU project design: Based on 51 MCU clock perpetual calendar
- drf的相关知识
- Zlmediakit push pull flow test
- Necessary protection ring for weak current detection
- Detailed process of generating URDF file from SW model
- (3) Dynamic digital tube
- loading view时,后面所有东西屏蔽
- 监听iPad键盘显示和隐藏事件
- Relevant knowledge of DRF
猜你喜欢

SQL learning experience (II): question brushing record

Stanford doggo source code study

Apple motherboard decoding chip, lightning Apple motherboard decoding I.C

Design of reverse five times voltage amplifier circuit

STM32 project design: an e-reader making tutorial based on stm32f4

51 single chip microcomputer project design: schematic diagram of timed pet feeding system (LCD 1602, timed alarm clock, key timing) Protues, KEIL, DXP

Koa_ mySQL_ Integration of TS

MPC learning notes (I): push MPC formula manually

Bezier curve learning

2020-10-17
随机推荐
监听iPad键盘显示和隐藏事件
关于极客时间 | MySQL实战45讲的部分总结
批量修改文件名
Can the encrypted JS code and variable name be cracked and restored?
Relevant knowledge of DRF
HEVC学习之码流分析
【Unity Mirror】NetworkTeam的使用
Embedded Software Engineer (6-15k) written examination interview experience sharing (fresh graduates)
Analysis of Yolo series principle
leetcode2022年度刷题分类型总结(十二)并查集
Design based on STM32 works: multi-functional atmosphere lamp, wireless control ws2812 of mobile app, MCU wireless upgrade program
SOC的多核启动流程详解
Detailed process of generating URDF file from SW model
XXL job configuration alarm email notification
批量执行SQL文件
FFmpeg音视频播放器实现
2020-10-20
When loading view, everything behind is shielded
STM32 encountered problems using encoder module (library function version)
static const与static constexpr的类内数据成员初始化