这是一个关于“间隙与岛屿”问题的变种。以下是潜在的解决方案:
SELECT
-- 4) 计算聚合值
MAX(CASE WHEN rowA = 1 AND valA IS NOT NULL THEN updatedby END) AS updatedA,
MAX(CASE WHEN rowB = 1 AND valB IS NOT NULL THEN updatedby END) AS updatedB,
MAX(CASE WHEN rowC = 1 AND valC IS NOT NULL THEN updatedby END) AS updatedC,
MAX(CASE WHEN rowD = 1 AND valD IS NOT NULL THEN updatedby END) AS updatedD,
MAX(CASE WHEN rowE = 1 AND valE IS NOT NULL THEN updatedby END) AS updatedE,
MAX(CASE WHEN rowA = 1 AND valA IS NOT NULL THEN realDate END) AS updatedOnA,
MAX(CASE WHEN rowB = 1 AND valB IS NOT NULL THEN realDate END) AS updatedOnB,
MAX(CASE WHEN rowC = 1 AND valC IS NOT NULL THEN realDate END) AS updatedOnC,
MAX(CASE WHEN rowD = 1 AND valD IS NOT NULL THEN realDate END) AS updatedOnD,
MAX(CASE WHEN rowE = 1 AND valE IS NOT NULL THEN realDate END) AS updatedOnE,
MAX(CASE WHEN rowA = 1 THEN valA END) AS ValA,
MAX(CASE WHEN rowB = 1 THEN valB END) AS ValB,
MAX(CASE WHEN rowC = 1 THEN valC END) AS ValC,
MAX(CASE WHEN rowD = 1 THEN valD END) AS ValD,
MAX(CASE WHEN rowE = 1 THEN valE END) AS ValE
FROM (
SELECT
-- 3) 分配行号
ROW_NUMBER() OVER(ORDER BY grpA DESC, realDate) AS rowA,
ROW_NUMBER() OVER(ORDER BY grpB DESC, realDate) AS rowB,
ROW_NUMBER() OVER(ORDER BY grpC DESC, realDate) AS rowC,
ROW_NUMBER() OVER(ORDER BY grpD DESC, realDate) AS rowD,
ROW_NUMBER() OVER(ORDER BY grpE DESC, realDate) AS rowE,
*
FROM (
SELECT
-- 2) 计算累计求和以形成分组
SUM(flagA) OVER(ORDER BY realDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grpA,
SUM(flagB) OVER(ORDER BY realDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grpB,
SUM(flagC) OVER(ORDER BY realDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grpC,
SUM(flagD) OVER(ORDER BY realDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grpD,
SUM(flagE) OVER(ORDER BY realDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grpE,
*
FROM (
SELECT
-- 1) 生成标志位来判断前一行值与当前行值是否相等
CASE WHEN LAG(vala) OVER(ORDER BY realdate) = vala OR vala IS NULL THEN 0 ELSE 1 END AS flagA,
CASE WHEN LAG(valb) OVER(ORDER BY realdate) = valb OR valb IS NULL THEN 0 ELSE 1 END AS flagB,
CASE WHEN LAG(valc) OVER(ORDER BY realdate) = valc OR valc IS NULL THEN 0 ELSE 1 END AS flagC,
CASE WHEN LAG(vald) OVER(ORDER BY realdate) = vald OR vald IS NULL THEN 0 ELSE 1 END AS flagD,
CASE WHEN LAG(vale) OVER(ORDER BY realdate) = vale OR vale IS NULL THEN 0 ELSE 1 END AS flagE,
*
FROM (
SELECT
*,
CONVERT(datetime, updatedon) AS realDate
FROM (
VALUES
(4, 56, 100, 20, 50, 50, NULL, N'david', N'1/4/2024'),
(3, 56, 100, 30, 50, 50, NULL, N'cameron', N'1/3/2024'),
(2, 56, 50, 30, 50, 50, NULL, N'bob', N'1/2/2024'),
(1, 56, 50, 40, 25, 50, NULL, N'alice', N'1/1/2024')
) t (HistoryId, primaryid, vala, valb, valc, vald, vale, updatedby, updatedon)
) x
) x
) x
) x
此SQL查询采用自底向上的处理方式,主要包括四个步骤:
- 根据前一行与当前行的值比较生成标志位,若相等则设置组标志为0,否则为1。
- 使用窗口函数SUM对标志位进行累计求和,从而形成连续相同值的“岛屿”。
- 找到每个组内的第一条记录。
- 最后,对于每一列,执行条件聚合计算,找出所需的更新者及日期信息,以及最新的值。
执行上述查询后的结果是一系列的值,如下所示:
| updatedA | updatedB | updatedC | updatedD | updatedE | updatedOnA | updatedOnB | updatedOnC | updatedOnD | updatedOnE | ValA | ValB | ValC | ValD | ValE |
|----------|----------|----------|----------|----------|----------------|----------------|----------------|----------------|------------|------|------|------|------|------|
| cameron | david | bob | alice | NULL | 2024-01-03 | 2024-01-04 | 2024-01-02 | 2024-01-01 | NULL | 100 | 20 | 50 | 50 | NULL |