这道题来自字节跳动支付业务的数据分析岗面试。抖音电商需要识别"消费升级"的用户——连续几天购买金额在增加,说明用户对平台的信任度在提升,可以推送更高客单价的商品。
业务场景:普通连续登录问题考察"日期是否连续",这道题升级为"金额是否递增"。需要同时判断日期连续性和数值递增性,两个条件缺一不可。
现有一张用户交易记录表 t14_zj_transaction,记录了用户每天的消费总额。请找出连续 3 天以上(含)交易金额严格递增的用户。
t14_zj_transaction 表
+----------+-------------+---------+
| user_id | trade_date | amount |
+----------+-------------+---------+
| 1 | 2025-01-01 | 100 |
| 1 | 2025-01-02 | 150 |
| 1 | 2025-01-03 | 200 |
| 1 | 2025-01-04 | 180 |
| 2 | 2025-01-01 | 50 |
| 2 | 2025-01-02 | 80 |
| 2 | 2025-01-03 | 70 |
| 3 | 2025-01-01 | 100 |
| 3 | 2025-01-02 | 120 |
| 3 | 2025-01-03 | 150 |
| 3 | 2025-01-05 | 200 |
+----------+-------------+---------+
期望输出:满足条件的用户ID,以及他们连续递增的起始日期、结束日期、连续天数。
这是"连续问题"的变体——不是判断连续日期,而是判断连续日期 + 递增金额:
DATEDIFF = 1 AND amount > prev_amount维度 | 评分 |
|---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️ |
Spark SQL
SELECT
user_id,
trade_date,
amount,
LAG(trade_date) OVER (PARTITION BY user_id ORDER BY trade_date) AS prev_date,
LAG(amount) OVER (PARTITION BY user_id ORDER BY trade_date) AS prev_amount
FROM t14_zj_transaction;
执行结果
+----------+-------------+---------+-------------+--------------+
| user_id | trade_date | amount | prev_date | prev_amount |
+----------+-------------+---------+-------------+--------------+
| 1 | 2025-01-01 | 100 | NULL | NULL |
| 1 | 2025-01-02 | 150 | 2025-01-01 | 100 |
| 1 | 2025-01-03 | 200 | 2025-01-02 | 150 |
| 1 | 2025-01-04 | 180 | 2025-01-03 | 200 |
| 2 | 2025-01-01 | 50 | NULL | NULL |
| 2 | 2025-01-02 | 80 | 2025-01-01 | 50 |
| 2 | 2025-01-03 | 70 | 2025-01-02 | 80 |
| 3 | 2025-01-01 | 100 | NULL | NULL |
| 3 | 2025-01-02 | 120 | 2025-01-01 | 100 |
| 3 | 2025-01-03 | 150 | 2025-01-02 | 120 |
| 3 | 2025-01-05 | 200 | 2025-01-03 | 150 |
+----------+-------------+---------+-------------+--------------+
11 rows selected (0.347 seconds)(https://www.dwsql.com)
Spark SQL
SELECT
user_id, trade_date, amount,
CASE
WHENDATEDIFF(trade_date, LAG(trade_date) OVER (PARTITIONBY user_id ORDERBY trade_date)) = 1
AND amount > LAG(amount) OVER (PARTITIONBY user_id ORDERBY trade_date)
THEN0ELSE1
ENDAS break_flag
FROM t14_zj_transaction;
执行结果
+----------+-------------+---------+-------------+
| user_id | trade_date | amount | break_flag |
+----------+-------------+---------+-------------+
| 1 | 2025-01-01 | 100 | 1 |
| 1 | 2025-01-02 | 150 | 0 |
| 1 | 2025-01-03 | 200 | 0 |
| 1 | 2025-01-04 | 180 | 1 |
| 2 | 2025-01-01 | 50 | 1 |
| 2 | 2025-01-02 | 80 | 0 |
| 2 | 2025-01-03 | 70 | 1 |
| 3 | 2025-01-01 | 100 | 1 |
| 3 | 2025-01-02 | 120 | 0 |
| 3 | 2025-01-03 | 150 | 0 |
| 3 | 2025-01-05 | 200 | 1 |
+----------+-------------+---------+-------------+
11 rows selected (0.363 seconds)(https://www.dwsql.com)
break_flag=1 表示"不满足连续递增",作为分组断点。
Spark SQL
WITH labeled AS (
SELECT
user_id, trade_date, amount,
CASE
WHENDATEDIFF(trade_date, LAG(trade_date) OVER w) = 1
AND amount > LAG(amount) OVER w
THEN0ELSE1
ENDAS break_flag
FROM t14_zj_transaction
WINDOW w AS (PARTITIONBY user_id ORDERBY trade_date)
),
grouped AS (
SELECT *,
SUM(break_flag) OVER (PARTITIONBY user_id ORDERBY trade_date) AS grp
FROM labeled
)
SELECT
user_id,
MIN(trade_date) AS start_date,
MAX(trade_date) AS end_date,
COUNT(*) AS consecutive_days
FROM grouped
GROUPBY user_id, grp
HAVINGCOUNT(*) >= 3;
最终结果:
+----------+-------------+-------------+-------------------+
| user_id | start_date | end_date | consecutive_days |
+----------+-------------+-------------+-------------------+
| 1 | 2025-01-01 | 2025-01-03 | 3 |
| 3 | 2025-01-01 | 2025-01-03 | 3 |
+----------+-------------+-------------+-------------------+
2 rows selected (0.323 seconds)(https://www.dwsql.com)
用户1 在 01-01 到 01-03 连续 3 天递增(100→150→200),但 01-04 金额下降到 180,中断。
坑1:两个 LAG 产生两次窗口扫描
LAG(trade_date) 和 LAG(amount) 是同一个窗口的不同列。用 WINDOW w AS (...) 只扫描一次。
坑2:首行的 NULL 处理
LAG 的首行返回 NULL,DATEDIFF(NULL, NULL) = NULL,NULL = 1 为 False,所以 CASE 会走到 ELSE 分支,break_flag=1。这意味着每组的第一行必然是断点。正确✅。
amount > prev_amount 中的 >DATEDIFF = 1 改为 DATEDIFF <= 2考点 | 说明 |
|---|---|
LAG + DATEDIFF | 判断日期是否连续 |
LAG + 比较 | 判断数值是否递增 |
0/1标志 + SUM OVER | 连续问题分组(经典解法) |
WINDOW 子句 | 复用窗口定义,减少重复代码 |
CREATE TABLEIFNOTEXISTS t14_zj_transaction (
user_id INT, trade_date STRING, amount BIGINT
);
INSERTINTO t14_zj_transaction VALUES
(1, '2025-01-01', 100),
(1, '2025-01-02', 150),
(1, '2025-01-03', 200),
(1, '2025-01-04', 180),
(2, '2025-01-01', 50),
(2, '2025-01-02', 80),
(2, '2025-01-03', 70),
(3, '2025-01-01', 100),
(3, '2025-01-02', 120),
(3, '2025-01-03', 150),
(3, '2025-01-05', 200);