startdateversion
2021-02-2812
2021-03-2711
2021-03-2910
2021-10-2812
2021-11-2611
2021-11-271
2021-11-2810
2021-12-269
需求:
如果两个startdate相差小于30天,那么认为这两个startdate对应的数据在一个集合中;如果第三条数据的startdate与这个集合中任意一条数据的startdate相差小于30天,认为这条数据也是属于这个集合;以此类推,将所有数据分为不同的集合。
输出每个集合中最大的version和它对应的那条数据。
思路:
重点在于分组。
先对所有数据按日期进行升序排序,如果一条数据的日期与前面集合最大日期的差值在30以内,则认为这条数据属于前一个集合。
在排好序的基础上,一条数据的日期与前一条数据的差值如果小于30就是0;如果大于30就是1(认为是一个新的集合)
按顺序是1,0,0,0,1,0,0等的情况。
然后sum over() ,按顺序是1,1,1,1,2,2,2等的情况
这样不同的集合就有了不同的值。
与前一条数据日期差是否在30以内为0
startdateversionper_startdateflagsum over
2021-02-28121900-01-0111
2021-03-27112021-02-2801
2021-03-29102021-03-2701
2021-10-28122021-03-2912
2021-11-26112021-10-2802
2021-11-2712021-11-2602
2021-11-28102021-11-2702
2021-12-2692021-11-2802 with testdataset as ( select '2021-02-28' as
startdate, 12 as version union select '2021-03-27' as startdate, 11 as version
union select '2021-03-29' as startdate, 10 as version union select '2021-10-28'
as startdate, 12 as version union select '2021-11-26' as startdate, 11 as
version union select '2021-11-27' as startdate, 1 as version union select
'2021-11-28' as startdate, 10 as version union select '2021-12-26' as
startdate, 9 as version ) select startdate,version,row_number()over(partition
by sumr order by version desc) as rank from ( select startdate,version,sum(case
when datediff(startdate,per_startdate) < 30 then 0 else 1 end) over (order by
startdate asc) as sumr from ( select
startdate,version,lag(startdate,1,'1900-01-01') over (ORDER BY startdate asc)
as per_startdate from testdataset ) ) having rank = 1 -- startdate version rank
-- 2021-02-28 12 1 -- 2021-10-28 12 1