当前位置:网站首页>当 Pandas 遇见 SQL,一个强大的工具库诞生了
当 Pandas 遇见 SQL,一个强大的工具库诞生了
2022-06-23 10:15:00 【Python数据挖掘】
本文的所有演示数据,均是基于下方的四张表。下面这四张表大家应该不陌生,这就是网传50道经典MySQL面试题中使用到的几张原表。关于下方各表之间的关联关系,我就不给大家说明了,仔细观察字段名,应该就可以发现。喜欢本文记得收藏、关注、点赞。
注:技术交流、资料获取,文末见

pandasql简介
pandas中的DataFrame是一个二维表格,数据库中的表也是一个二维表格,因此在pandas中使用sql语句就显得水到渠成,pandasql使用SQLite作为其操作数据库,同时Python自带SQLite模块,不需要安装,便可直接使用。
这里有一点需要注意的是:使用pandasql读取DataFrame中日期格式的列,默认会读取年月日、时分秒,因此我们要学会使用sqlite中的日期处理函数,方便我们转换日期格式,下方提供sqlite中常用函数大全,希望对你有帮助。
sqlite函数大全:http://suo.im/5DWraE
导入相关库:
import pandas as pd
from pandasql import sqldf
声明全局变量的2种方式
① 在使用之前,声明该全局变量;
② 一次性声明好全局变量;
在使用之前,声明该全局变量
df1 = pd.read_excel("student.xlsx")
df2 = pd.read_excel("sc.xlsx")
df3 = pd.read_excel("course.xlsx")
df4 = pd.read_excel("teacher.xlsx")
global df1
global df2
global df3
global df4
query1 = "select * from df1 limit 5"
query2 = "select * from df2 limit 5"
query3 = "select * from df3"
query4 = "select * from df4"
sqldf(query1)
sqldf(query2)
sqldf(query3)
sqldf(query4)
部分结果如下:

一次性声明好全局变量
df1 = pd.read_excel("student.xlsx")
df2 = pd.read_excel("sc.xlsx")
df3 = pd.read_excel("course.xlsx")
df4 = pd.read_excel("teacher.xlsx")
pysqldf = lambda q: sqldf(q, globals())
query1 = "select * from df1 limit 5"
query2 = "select * from df2 limit 5"
query3 = "select * from df3"
query4 = "select * from df4"
sqldf(query1)
sqldf(query2)
sqldf(query3)
sqldf(query4)
部分结果如下:

写几个简单的SQL语句
查看sqlite的版本
student = pd.read_excel("student.xlsx")
pysqldf = lambda q: sqldf(q, globals())
query1 = """ select sqlite_version(*) """
pysqldf(query1)
结果如下:

where筛选
student = pd.read_excel("student.xlsx")
pysqldf = lambda q: sqldf(q, globals())
query1 = """ select * from student where strftime('%Y-%m-%d',sage) = '1990-01-01' """
pysqldf(query1)
结果如下:

多表连接
student = pd.read_excel("student.xlsx")
sc = pd.read_excel("sc.xlsx")
pysqldf = lambda q: sqldf(q, globals())
query2 = """ select * from student s join sc on s.sid = sc.sid """
pysqldf(query2)
部分结果如下:

分组聚合
student = pd.read_excel("student.xlsx")
sc = pd.read_excel("sc.xlsx")
pysqldf = lambda q: sqldf(q, globals())
query2 = """ select s.sname as 姓名,sum(sc.score) as 总分 from student s join sc on s.sid = sc.sid group by s.sname """
pysqldf(query2)
结果如下:

union查询
student = pd.read_excel("student.xlsx")
pysqldf = lambda q: sqldf(q, globals())
query1 = """ select * from student where strftime('%Y-%m',sage) = '1990-01' union select * from student where strftime('%Y-%m',sage) = '1990-12' """
pysqldf(query1)
结果如下:

技术交流
目前开通了技术交流群,群友已超过3000人,添加时最好的备注方式为:来源+兴趣方向,方便找到志同道合的朋友
方式①、发送如下图片至微信,长按识别,后台回复:加群;
方式②、添加微信号:dkl88191,备注:来自CSDN
方式③、微信搜索公众号:Python学习与数据挖掘,后台回复:加群
边栏推荐
- [software and system security] heap overflow
- Experience of using thread pool in project
- 2021-04-15
- RT thread add MSH command
- laravel8 beanstalk 使用说明
- 2021-05-12 interface definition and Implementation
- Musk's 18-year-old son petitioned to change his name to sever the father son relationship
- 陆奇首次出手投资量子计算
- 2021-05-12 internal class
- NOI OJ 1.3 15:苹果和虫子 C语言
猜你喜欢

Developer, you may have some misunderstandings about cloud computing

Unity技术手册 - 生命周期LifetimebyEmitterSpeed-周期内颜色ColorOverLifetime-速度颜色ColorBySpeed

NFTs、Web3和元宇宙对数字营销意味着什么?

2021-05-07构造器

Set up a QQ robot for ordering songs, and watch beautiful women

Too helpless! Microsoft stopped selling AI emotion recognition and other technologies, saying frankly: "the law can not keep up with the development of AI"

R和RStudio下载安装详细步骤

Copilot免费时代结束!正式版67元/月,学生党和热门开源项目维护者可白嫖

oracle中遇到的bug

RT-Thread 添加 msh 命令
随机推荐
IPv6 的速度比 IPv4 更快?
sql根据比较日期新建字段
2021-05-11抽象类
Mysql-03. Experience of SQL optimization in work
Experience of using thread pool in project
Solve the problem that Preview PDF cannot be downloaded
Numerical calculation method
Build the information and innovation industry ecology, and make mobile cloud based on the whole stack of independent innovation
5 login failures, limiting login practice
【软件与系统安全】堆溢出
2021-05-12 interface definition and Implementation
Mysql 的Innodb引擎和Myisam数据结构和区别
用贪吃蛇小游戏表白(附源码)
Mysql-03.工作中对SQL优化的心得体会
2021-05-11 abstract class
Bi SQL drop & alter
解决audio自动播放无效问题
Mathematical analysis_ Notes_ Chapter 2: real and plural numbers
What is JSX in the JS tutorial? Why do we need it?
[software and system security] heap overflow