京东每个商品详情页下方都有用户评论,运营人员需要定期生成"商品口碑报表"——哪些商品好评率高可以加大推广,哪些商品差评集中需要联系供应商改进。
📝 评分体系 京东采用 1-5 星评分体系。通常约定:
这是电商数据分析中最基础的"情感分类"(Sentiment Classification)任务。
现有一张商品评论表 t4_jd_product_review,记录了用户对商品的评分(1-5分)和评论内容。请按商品统计好评、中评、差评的数量及占比。
t4_jd_product_review 表
+----------+------------+--------+-------------------+
| review_id| product_id | rating | review_text |
+----------+------------+--------+-------------------+
| 1 | P001 | 5 | 非常好用 |
| 2 | P001 | 4 | 不错 |
| 3 | P001 | 2 | 一般般 |
| 4 | P002 | 5 | 性价比高 |
| 5 | P002 | 1 | 质量太差了 |
| 6 | P001 | 3 | 还行 |
| 7 | P002 | 4 | 物流很快 |
| 8 | P001 | 1 | 收到就是坏的 |
| 9 | P003 | 5 | 完美 |
| 10 | P003 | 5 | 推荐购买 |
| 11 | P003 | 4 | 不错的产品 |
| 12 | P002 | 5 | 第二次购买了 |
+----------+------------+--------+-------------------+
分类规则:
期望输出:每个商品的 评论总数、好评数、中评数、差评数、好评率(%)。
本题核心是 CASE WHEN + SUM 条件聚合,是 SQL 面试中必考的基础题型。
SUM(CASE WHEN ... THEN 1 ELSE 0 END) 统计各类别数量SUM(CASE WHEN 好评 THEN 1 END) * 100.0 / COUNT(*) 得到好评率💡 COUNT vs SUM 条件计数有两种写法,推荐用 SUM:
-- 推荐(SUM 短路径求值)
SUM(CASE WHEN condition THEN 1 ELSE 0 END)
-- 备选(COUNT 忽略 NULL 的特性)
COUNT(CASE WHEN condition THEN 1 END)
两者结果相同,但 SUM 写法更直观——"满足条件加 1,否则加 0"。
维度 | 评分 |
|---|---|
题目难度 | ⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
在写聚合 SQL 之前,先快速扫一眼各商品评论情况。
SELECT
product_id,
rating,
COUNT(*) AS cnt
FROM t4_jd_product_review
GROUP BY product_id, rating
ORDER BY product_id, rating;
执行结果:
+------------+--------+-----+
| product_id | rating | cnt |
+------------+--------+-----+
| P001 | 1 | 1 |
| P001 | 2 | 1 |
| P001 | 3 | 1 |
| P001 | 4 | 1 |
| P001 | 5 | 1 |
| P002 | 1 | 1 |
| P002 | 4 | 1 |
| P002 | 5 | 2 |
| P003 | 4 | 1 |
| P003 | 5 | 2 |
+------------+--------+-----+
直观感受:
先不聚合,给每条评论打上情感标签,验证分类逻辑是否正确。
SELECT
product_id,
rating,
CASE
WHEN rating >= 4 THEN '好评'
WHEN rating = 3 THEN '中评'
ELSE '差评'
END AS sentiment
FROM t4_jd_product_review;
执行结果:
+-------------+---------+------------+
| product_id | rating | sentiment |
+-------------+---------+------------+
| P001 | 5 | 好评 |
| P001 | 4 | 好评 |
| P001 | 2 | 差评 |
| P002 | 5 | 好评 |
| P002 | 1 | 差评 |
| P001 | 3 | 中评 |
| P002 | 4 | 好评 |
| P001 | 1 | 差评 |
| P003 | 5 | 好评 |
| P003 | 5 | 好评 |
| P003 | 4 | 好评 |
| P002 | 5 | 好评 |
+-------------+---------+------------+
12 rows selected (7.686 seconds)(dwsql.com)
把标签列"折叠"为计数列——每个 product_id 一行,每种情感一列。
SELECT
product_id,
COUNT(*) AS total_reviews,
SUM(CASEWHEN rating >= 4THEN1ELSE0END) AS good_reviews,
SUM(CASEWHEN rating = 3THEN1ELSE0END) AS neutral_reviews,
SUM(CASEWHEN rating <= 2THEN1ELSE0END) AS bad_reviews
FROM t4_jd_product_review
GROUPBY product_id
ORDERBY product_id;
执行结果:
+-------------+----------------+---------------+------------------+--------------+
| product_id | total_reviews | good_reviews | neutral_reviews | bad_reviews |
+-------------+----------------+---------------+------------------+--------------+
| P001 | 5 | 2 | 1 | 2 |
| P002 | 4 | 3 | 0 | 1 |
| P003 | 3 | 3 | 0 | 0 |
+-------------+----------------+---------------+------------------+--------------+
3 rows selected (1.518 seconds)(dwsql.com)
SELECT
product_id,
COUNT(*) AS total_reviews,
SUM(CASEWHEN rating >= 4THEN1ELSE0END) AS good_reviews,
SUM(CASEWHEN rating = 3THEN1ELSE0END) AS neutral_reviews,
SUM(CASEWHEN rating <= 2THEN1ELSE0END) AS bad_reviews,
ROUND(SUM(CASEWHEN rating >= 4THEN1ELSE0END) * 100.0 / COUNT(*), 1) AS good_rate
FROM t4_jd_product_review
GROUPBY product_id
ORDERBY product_id;
最终结果:
+-------------+----------------+---------------+------------------+--------------+------------+
| product_id | total_reviews | good_reviews | neutral_reviews | bad_reviews | good_rate |
+-------------+----------------+---------------+------------------+--------------+------------+
| P001 | 5 | 2 | 1 | 2 | 40.0 |
| P002 | 4 | 3 | 0 | 1 | 75.0 |
| P003 | 3 | 3 | 0 | 0 | 100.0 |
+-------------+----------------+---------------+------------------+--------------+------------+
3 rows selected (0.535 seconds)(dwsql.com)
分析洞察:
⚠️ SUM 的 ELSE 0 不能省略
-- ❌ 省略 ELSE → 条件不满足时 SUM 收到的不是 0 而是 NULL
SUM(CASE WHEN rating >= 4 THEN 1 END)
-- SUM 会跳过 NULL,结果看似正确,但语义上不够明确
-- ✅ 显式 ELSE 0
SUM(CASE WHEN rating >= 4 THEN 1 ELSE 0 END)
虽然 SUM 忽略 NULL 的行为和 SUM(0) 一样,但面试中显式写 ELSE 0 说明你理解边界情况。
⚠️ 整数除法陷阱
-- ❌ 整数除法 → 结果为 0
SELECT 2 / 5; -- 结果是 0,不是 0.4
-- ✅ 乘以 100.0 触发隐式转换
SELECT 2 * 100.0 / 5; -- 结果是 40.0
* 100.0 中的 .0 是关键——它把整数运算转为浮点运算。忘记写 .0 是面试中最常见的低级错误。
review_date 字段,加 GROUP BY DATE_FORMAT(review_date, 'yyyy-MM') 可以看到好评率的月度变化趋势SUM(CASE WHEN rating>=4 AND has_image=1 THEN 2 ELSE CASE WHEN rating>=4 THEN 1 ELSE 0 END)(SUM(CASE WHEN rating=5 THEN 1 END) - SUM(CASE WHEN rating<=2 THEN 1 END)) * 100.0 / COUNT(*)💡 条件聚合标准模板
SELECT
group_col,
COUNT(*) AS total,
SUM(CASEWHEN <条件1> THEN1ELSE0END) AS type1_cnt,
SUM(CASEWHEN <条件2> THEN1ELSE0END) AS type2_cnt,
ROUND(SUM(CASEWHEN <条件1> THEN1ELSE0END) * 100.0 / COUNT(*), 1) AS type1_pct
FROMtable
GROUPBY group_col
ORDERBY group_col;
条件聚合,将行级标签折叠为列级计数CREATE TABLEIFNOTEXISTS t4_jd_product_review (
review_id INT,
product_id STRING,
rating INT,
review_text STRING
);
INSERTINTO t4_jd_product_review VALUES
(1, 'P001', 5, '非常好用'),
(2, 'P001', 4, '不错'),
(3, 'P001', 2, '一般般'),
(4, 'P002', 5, '性价比高'),
(5, 'P002', 1, '质量太差了'),
(6, 'P001', 3, '还行'),
(7, 'P002', 4, '物流很快'),
(8, 'P001', 1, '收到就是坏的'),
(9, 'P003', 5, '完美'),
(10, 'P003', 5, '推荐购买'),
(11, 'P003', 4, '不错的产品'),
(12, 'P002', 5, '第二次购买了');