当前位置:网站首页>SQL Server - Window Function - 解决连续N条记录过滤问题
SQL Server - Window Function - 解决连续N条记录过滤问题
2022-06-27 17:52:00 【懒人Ethan】
概要
我们在开发应用系统处理各种报表的时候,有时候会遇到统计连续几天温度低于0度,一天内连续登录系统多次的类似需求。本文主要介绍如何利用SQL Server中的窗口函数,来解决这些复杂的查询问题。
本文将通过两个实例来进行具体阐述。
代码及实现
统计一天内连续登录3次的用户
建表语句如下:
if object_id('login_details') is not null
drop table login_details;
create table login_details(
login_id int primary key,
user_name varchar(50) not null,
login_date date
);
登录记录表包含登录Id,用户名和登录日期,其中login_id 是主键。
数据初始化代码见附录。
我们需要统计在一天内连续登录三次或三次以上的人员的用户名和日期。连续即登录Id连续。
代码如下:
;WITH DUPICATE_3 AS (
SELECT
ld.[login_id], ld.[user_name], ld.[login_date],
CASE
WHEN
ld.[user_name] = LEAD(ld.[user_name]) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
AND
ld.[user_name] = LEAD(ld.[user_name],2) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
THEN 1
WHEN
ld.[user_name] = LAG(ld.[user_name]) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
AND
ld.[user_name] = LEAD(ld.[user_name]) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
THEN 1
WHEN
ld.[user_name] = LAG(ld.[user_name]) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
AND
ld.[user_name] = LAG(ld.[user_name],2) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
THEN 1
END AS TAG
FROM [login_details] ld
)
,DUPICATE_FILTER AS(
SELECT DISTINCT d.[user_name], d.[login_date]
FROM DUPICATE_3 d WHERE d.TAG = 1
)
SELECT
d4.[user_name],d4.login_date,
COUNT(d4.[user_name]) AS login_time
FROM DUPICATE_FILTER d4
LEFT JOIN DUPICATE_3 d3
ON d4.login_date = d3.login_date AND d4.[user_name] = d3.[user_name]
GROUP BY d4.[user_name],d4.login_date
- 避免嵌套查询,定义一个 DIPICATE_3的CTE。该CTE主要是生成一个标记列TAG ,如果一天登录三次或三次以上,则该列为1,其他情况该列为0。
- 通过窗口分析函数LEAD/LAG,进行数据分析。按照日期分组,每个组按照登录Id排序,
a. 如果当前记录的用户名和下一条,下下一条的用户名都相同,则该条记录被标记为1;
b. 如果当前用户名和上一条,下一条的用户名都相同,则该条记录被标记为1;
c. 如果当前用户名和上一条,上上一条的用户名都相同,则该条记录被标记为1; - 汇总登录次数,查询结果如下:

统计连续三天温度小于0度的记录
建表语句如下:
if object_id('weather','U') is not null
drop table weather
create table weather
(
id int primary key,
city varchar(50) not null,
temperature int not null,
day date not null
);
气象信息记录表包含Id,城市名, 温度和记录日期,其中id 是主键,该列是自增的数字列。
数据初始化代码见附录。
我们需要统计连续三天温度小于0度的记录。
该问题解决方法与上一个登录问题类似,直接给出解决方案
WITH WEATHER_ADD_TAG AS (
SELECT *,
CASE
WHEN
ld.temperature < 0
AND
LEAD(ld.temperature) OVER(PARTITION BY 1 ORDER BY ld.[day]) < 0
AND
LEAD(ld.temperature,2) OVER(PARTITION BY 1 ORDER BY ld.[day]) < 0
THEN 1
WHEN
LAG(ld.temperature) OVER(PARTITION BY 1 ORDER BY ld.[day]) < 0
AND
ld.temperature < 0
AND
LEAD(ld.temperature) OVER(PARTITION BY 1 ORDER BY ld.[day ]) < 0
THEN 1
WHEN
LAG(ld.temperature) OVER(PARTITION BY 1 ORDER BY ld.[day ]) < 0
AND
LAG(ld.temperature,2) OVER(PARTITION BY 1 ORDER BY ld.[day ]) < 0
AND
ld.temperature < 0
THEN 1
END AS TAG
FROM weather ld
)
SELECT * FROM WEATHER_ADD_TAG WHERE TAG = 1
执行结果如下:

方案优化
该方案的case when部分过于繁琐,我们的需要的是找出温度低于0度的记录,并不需要像上例那样进行严格的字符串匹配。
连续三天或三天以上温度小于0度等价于3天内的高温度小于0度。
优化代码如下:
SELECT Id, City, Temperature, Day
FROM
(
SELECT *,
CASE
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN CURRENT ROW and 2 FOLLOWING ) < 0
THEN 1
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING ) < 0
THEN 1
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN 2 PRECEDING and CURRENT ROW ) < 0
THEN 1
END AS TAG
FROM weather w
) x WHERE x.TAG = 1;
执行结果如下:

结果有问题,多了两天记录。
LAG/LEAD两个方法如果遇上记录不存在的情况,例如第一条记录之前的记录,最后一条记录之后的记录,都是按照NULL来处理并参与运算,可以被过滤掉。
本例用的是FOLLOWING/PRECEDING 来获取前一条和后一条记录,对于不存在的记录,它并不会按照NULL来处理。第一条之前的记录或最后一条之后的记录不存在,则不参与运算。1月1号之前的记录并不会按照空值处理,而是不参与运算,所以1月1号和2号的温度都小于0度,它们也就加入了最后的结果中。
解决方案:
增加两条不符合要求的记录,作为第一条和最后一条记录,
代码如下:
;WITH APPEND_MIN_MAX_DATE_CTE as (
SELECT * FROM weather w
UNION
SELECT 1, 'London', 0, '2020-01-01'
UNION
SELECT 1, 'London', 0, '2050-01-01'
)
SELECT Id, City, Temperature, Day
FROM
(
SELECT *,
CASE
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN CURRENT ROW and 2 FOLLOWING ) < 0
THEN 1
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING ) < 0
THEN 1
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN 2 PRECEDING and CURRENT ROW ) < 0
THEN 1
END AS TAG
FROM APPEND_MIN_MAX_DATE_CTE w
) x WHERE x.TAG = 1;
执行结果如下:

统计连续N天温度小于0度的记录
下面需求升级,不再设定具体的天数,而是变成用户通过输入,由自己来控制。
显然,已有的方案都是基于已知的天数,无法满足新的需求。我们需要重新定义连续N天在T-SQL中的判定方式。
实现代码如下:
;WITH ADD_ROW_NUMBER_CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [day])AS RN
FROM weather w
),
ADD_ROW_NUMBER_LT_0_CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [day])AS RN_LT_0
FROM ADD_ROW_NUMBER_CTE WHERE temperature < 0
),
ADD_DIFF_CTE AS (
SELECT *,
(c.RN - c.RN_LT_0) AS DIFF
FROM ADD_ROW_NUMBER_LT_0_CTE c
),
ADD_COUNT_CTE AS (
SELECT *,
COUNT(*) OVER (PARTITION BY DIFF ORDER BY DIFF) AS CNT
FROM ADD_DIFF_CTE
)
SELECT * FROM ADD_COUNT_CTE WHERE CNT = 4
- 定义CTE,ADD_ROW_NUMBER_CTE,新增RN序号列,按照日期排序。
- 定义CTE,ADD_ROW_NUMBER_LT_0_CTE ,新增RN_LT_0序号列,按照日期排序,但是过滤掉温度大于0的记录。
- 求RN和RN_LT_0的差,差值相同的列,证明它们的连续的。
- 定义CTE,ADD_COUNT_CTE ,统计相同差值的个数,该个数就是连续温度低于0度的天数,我们可以设定任何数值,以满足需求。
附录
登录记录表
if object_id('login_details') is not null
drop table login_details;
create table login_details(
login_id int primary key,
user_name varchar(50) not null,
login_date date);
truncate table login_details;
insert into login_details values
(101, 'Michael', GETDATE()),
(102, 'James', GETDATE()),
(103, 'Stewart', DATEADD(DD,1,GETDATE())),
(104, 'Stewart', DATEADD(DD,1,GETDATE())),
(105, 'Stewart', DATEADD(DD,1,GETDATE())),
(106, 'Michael', DATEADD(DD,2,GETDATE())),
(107, 'Michael', DATEADD(DD,2,GETDATE())),
(108, 'Stewart', DATEADD(DD,3,GETDATE())),
(109, 'Stewart', DATEADD(DD,3,GETDATE())),
(110, 'James', DATEADD(DD,4,GETDATE())),
(111, 'James', DATEADD(DD,4,GETDATE())),
(112, 'James', DATEADD(DD,4,GETDATE())),
(113, 'James', DATEADD(DD,4,GETDATE())),
(114, 'James', DATEADD(DD,5,GETDATE())),
(115, 'Charles', DATEADD(DD,1,GETDATE())),
(116, 'Charles', DATEADD(DD,1,GETDATE())),
(117, 'Charles', DATEADD(DD,1,GETDATE()));
气象信息表
if object_id('weather','U') is not null
drop table weather
create table weather
(
id int primary key,
city varchar(50) not null,
temperature int not null,
day date not null
);
delete from weather;
insert into weather values
(1, 'London', -1, '2021-01-01'),
(2, 'London', -2, '2021-01-02'),
(3, 'London', 4, '2021-01-03'),
(4, 'London', 1, '2021-01-04'),
(5, 'London', -2, '2021-01-05'),
(6, 'London', -5, '2021-01-06'),
(7, 'London', -7, '2021-01-07'),
(8, 'London', 5, '2021-01-08'),
(9, 'London', -20,'2021-01-09'),
(10, 'London', 20, '2021-01-10'),
(11, 'London', 22,'2021-01-11'),
(12, 'London', -1, '2021-01-12'),
(13, 'London', -2, '2021-01-13'),
(14, 'London', -2, '2021-01-14'),
(15, 'London', -4, '2021-01-15'),
(16, 'London', -9, '2021-01-16'),
(17, 'London', 0, '2021-01-17'),
(18, 'London', -10, '2021-01-18'),
(19, 'London', -11, '2021-01-19'),
(20, 'London', -12, '2021-01-20'),
(21, 'London', -11, '2021-01-21');
边栏推荐
- 作用域-Number和String的常用Api(方法)
- Jinyuan's high-end IPO was terminated: it was planned to raise 750million Rushan assets and Liyang industrial investment were shareholders
- maxwell 报错(连接为mysql 8.x)解决方法
- MySQL读取Binlog日志常见错误和解决方法
- 实战回忆录:从Webshell开始突破边界
- Current market situation and development prospect forecast of the global ductless heating, ventilation and air conditioning system industry in 2022
- openssl客户端编程:一个不起眼的函数导致的SSL会话失败问题
- 驾驭一切的垃圾收集器 -- G1
- 工作流自动化 低代码是关键
- 【ELT.ZIP】OpenHarmony啃论文俱乐部—数据密集型应用内存压缩
猜你喜欢

如何实现IM即时通讯“消息”列表卡顿优化

laravel框架中 定时任务的实现

全面解析零知识证明:消解扩容难题 重新定义「隐私安全」

Oracle 获取月初、月末时间,获取上一月月初、月末时间

Keras深度学习实战(12)——面部特征点检测

工作流自动化 低代码是关键

Running lantern experiment based on stm32f103zet6 library function

金源高端IPO被终止:曾拟募资7.5亿 儒杉资产与溧阳产投是股东

原创 | 2025实现“5个1”奋斗目标!解放动力全系自主非道路国四产品正式发布

New Zhongda chongci scientific and Technological Innovation Board: annual revenue of 284million and proposed fund-raising of 557million
随机推荐
Redis 原理 - String
Substrate及波卡一周技术更新速递 20220425 - 20220501
华大单片机KEIL添加ST-LINK解决方法
New Zhongda chongci scientific and Technological Innovation Board: annual revenue of 284million and proposed fund-raising of 557million
IDEA 官网插件地址
Introduction to deep learning and neural networks
2022年第一季度消费金融APP用户洞察——总数达4479万人
芯动联科冲刺科创板:年营收1.7亿 北方电子院与中城创投是股东
9.OpenFeign服务接口调用
Jinyuan's high-end IPO was terminated: it was planned to raise 750million Rushan assets and Liyang industrial investment were shareholders
GIS遥感R语言学习看这里
CDGA|交通行业做好数字化转型的核心是什么?
A simple calculation method of vanishing point
華大單片機KEIL報錯_WEAK的解决方案
电脑安全证书错误怎么处理比较好
Solution to Maxwell error (MySQL 8.x connection)
别焦虑了,这才是中国各行业的工资真相
DFS and BFS simple principle
【云驻共创】 什么是信息化?什么是数字化?这两者有什么联系和区别?
脉脉热帖:为啥大厂都热衷于造轮子?