首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >存储groupwise最近的相关记录

存储groupwise最近的相关记录
EN

Database Administration用户
提问于 2019-07-27 11:40:11
回答 1查看 89关注 0票数 0

我有两张桌子,顾客和采购品。每个顾客有很多(数千)的购买。我通常只需要为每个客户最近的购买,这就是为什么我有"latest_purchase_id“列,并更新它,每当我添加购买。

我不想维护"latest_purchase_id",所以我一直在测试查询。他们最后都变慢了,我不知道为什么。

顾客:

代码语言:javascript
复制
       Column        |  Type    |                       Modifiers                        | Storage  | Stats target | Description
---------------------+----------+--------------------------------------------------------+----------+--------------+-------------
 id                  | integer  | not null default nextval('customers_id_seq'::regclass) | plain    |              |
 latest_purchase_id  | integer  |                                                        | plain    |              |
Indexes:
    "customers_pkey" PRIMARY KEY, btree (id)
    "customers_latest_purchase_id" btree (latest_purchase_id)
Foreign-key constraints:
    "customers_latest_purchase_fk" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "purchases" CONSTRAINT "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

购买:

代码语言:javascript
复制
     Column   |  Type     |                        Modifiers                       | Storage  | Stats target | Description
--------------+-----------+--------------------------------------------------------+----------+--------------+-------------
 id           | integer   | not null default nextval('purchases_id_seq'::regclass) | plain    |              |
 customer_id  | integer   |                                                        | plain    |              |
Indexes:
    "purchases_pkey" PRIMARY KEY, btree (id)
    "purchases_id_customer_id" btree (id, customer_id)
    "purchases_customer_id" btree (customer_id)
Foreign-key constraints:
    "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "customers" CONSTRAINT "customers_latest_purchase_id" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no
代码语言:javascript
复制
SELECT customers.id, purchases.id 
FROM customers 
   JOIN purchases ON customers.latest_purchase_id = purchases.id;

48毫秒

代码语言:javascript
复制
SELECT DISTINCT ON (customer_id) id, customer_id
FROM purchases
ORDER BY customer_id, id DESC;

1040毫秒

代码语言:javascript
复制
SELECT customers.id, p.id
FROM customers INNER JOIN (
    SELECT RANK()
    OVER (PARTITION BY customer_id ORDER BY id DESC) r, *
    FROM purchases
) p
ON customers.id = p.customer_id
WHERE p.r = 1;

836毫秒

代码语言:javascript
复制
SELECT customers.id, p1.id
FROM customers
JOIN purchases p1 ON customers.id = p1.customer_id
LEFT OUTER JOIN purchases p2 ON (customers.id = p2.customer_id and p1.id < p2.id)
WHERE p2.id IS NULL;

1833

代码语言:javascript
复制
SELECT customers.id, p.id
FROM customers CROSS JOIN LATERIAL (
    SELECT purchases.id, purchases.customer_id
    FROM purchases
    WHERE purchases.customer_id = customers.id
    ORDER BY purchases.id DESC
    LIMIT 1
) p;

23442

正如您所看到的,"latest_purchase_id“比其他任何东西都快得多。性能的好处显然是一种权衡,因为购买插入将花费大约两倍的时间(我在下面的触发器中显着地改进了这一点)。该查询也仅限于最近的购买。不对查询进行动态更改,以匹配在特定事务值上的最新购买。

即使我已经设置了索引,其他查询也会如此缓慢吗?实际上,我只需要为每个客户ID找到最大的购买ID,"purchases_id_customer_id“索引应该能够轻松地处理这些ID。

以下是前两个查询的解释分析输出:

代码语言:javascript
复制
EXPLAIN ANALYZE SELECT customers.id, purchases.id FROM customers JOIN purchases ON customers.latest_purchase_id = purchases.id;
 Nested Loop  (cost=0.42..11643.46 rows=3422 width=8) (actual time=0.961..72.014 rows=340 loops=1)
   ->  Seq Scan on customers  (cost=0.00..93.22 rows=3422 width=8) (actual time=0.010..1.239 rows=3420 loops=1)
   ->  Index Only Scan using purchases_pkey on purchases  (cost=0.42..3.38 rows=1 width=4) (actual time=0.020..0.020 rows=0 loops=3420)
         Index Cond: (id = d.latest_purchase_id)
         Heap Fetches: 137
 Planning Time: 0.681 ms
 Execution Time: 72.134 ms
代码语言:javascript
复制
EXPLAIN ANALYZE SELECT DISTINCT ON (customer_id) id, customer_id FROM purchases ORDER BY customer_id, id DESC;
 Unique  (cost=78791.68..81715.56 rows=157 width=8) (actual time=1092.279..1434.771 rows=407 loops=1)
   ->  Sort  (cost=78791.68..80253.62 rows=584777 width=8) (actual time=1092.277..1291.642 rows=585790 loops=1)
         Sort Key: customer_id, id DESC
         Sort Method: external merge  Disk: 8304kB
         ->  Seq Scan on purchases  (cost=0.00..14779.77 rows=584777 width=8) (actual time=0.736..610.967 rows=585790 loops=1)
 Planning Time: 0.098 ms
 Execution Time: 1436.267 ms

编辑:我将索引更正为(customer_id,id),但速度仍然很慢。它现在是更多的数据,所以时间不是完全可比的,但它仍然没有接近触发器的方法。

代码语言:javascript
复制
EXPLAIN ANALYZE SELECT DISTINCT ON (customer_id) id, customer_id FROM purchases ORDER BY customer_id, id;
 Result  (cost=0.43..162525.52 rows=381 width=8) (actual time=0.513..1461.147 rows=823 loops=1)
   ->  Unique  (cost=0.43..162525.52 rows=381 width=8) (actual time=0.510..1460.719 rows=823 loops=1)
         ->  Index Only Scan using purchases_customer_id_id_idx on purchases  (cost=0.43..157859.86 rows=1866267 width=8) (actual time=0.508..981.186 rows=1866213 loops=1)
               Heap Fetches: 1363609
 Planning Time: 0.096 ms
 Execution Time: 1461.359 ms
(6 rows)
EN

回答 1

Database Administration用户

发布于 2019-07-29 09:27:03

表达式索引不能使用子查询或稳定/易失性函数。不可能有包含依赖于其他行中值的值的索引,因为表中的任何单个更改都可能会要求更改无界索引项的数目。

因此,您必须实际将所需的属性存储在某个地方:在foo中,或者在bar中作为布尔值(对于部分索引仍然有效),或者在一个单独的表中。

帮助查找客户最新购买的最佳索引要求购买ID位于客户ID之后,即(customer_id, id)

票数 0
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/243942

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档