我正在用python中的mysql-connector连接到mysql数据库。在正常的事件运行中,我不需要处理超过1000行,但有时我可能需要处理超过20k行的数据,这就是我的查询变得非常慢和超时的地方。我曾尝试在mysql-connector中使用迭代器来限制在给定时间处理的数量,但这似乎只针对py中的MySQLdb实现。
SELECT
ROUND(311.30004 - (20.110938 * temp.value)
- (2.012626 * hum.value) - (2.006346 * airspeed.value)
+ (0.4059 * POWER(temp.value, 2) + (0.006604 * POWER(hum.value, 2))
+ (3.151145 * POWER(airspeed.value, 2) + (0.05555 * temp.value * hum.value)
- (0.37037 * temp.value * airspeed.value) + (0.03968 * hum.value * airspeed.value))), 2)
AS value, temp.time, temp.x, temp.y, temp.z, temp.round_id, 'predicted_frequency', temp.round_number, temp.day_of_production
FROM round_data_temperature AS temp
INNER JOIN round_data_humidity AS hum
ON temp.round_number = hum.round_number
AND temp.x = hum.x
AND temp.y = hum.y AND DATE_FORMAT(temp.time, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(hum.time, '%Y-%m-%d %H:%i:00')
INNER JOIN round_data_airspeed AS airspeed
ON temp.round_number = airspeed.round_number
AND temp.x = airspeed.x AND temp.y = airspeed.y
AND DATE_FORMAT(temp.time, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(airspeed.time, '%Y-%m-%d %H:%i:00')
WHERE temp.round_id ='xxxxx'这段代码只在mysql上运行非常快,但是在python中的mysql-connector中运行速度非常慢,并且有20k以上的行数,所以我决定立即插入,以避免在python中选择非常大的数据包。我做了以下工作:
INSERT INTO round_data_pf (value, time, x, y, z, round_id, observable_name, round_number, day_of_production)
SELECT
ROUND(311.30004 - (20.110938 * temp.value)
- (2.012626 * hum.value) - (2.006346 * airspeed.value)
+ (0.4059 * POWER(temp.value, 2) + (0.006604 * POWER(hum.value, 2))
+ (3.151145 * POWER(airspeed.value, 2) + (0.05555 * temp.value * hum.value)
- (0.37037 * temp.value * airspeed.value) + (0.03968 * hum.value * airspeed.value))), 2)
AS value, temp.time, temp.x, temp.y, temp.z, temp.round_id, 'pf', temp.round_number, temp.day_of_production
FROM round_data_temperature AS temp
INNER JOIN round_data_humidity AS hum
ON temp.round_number = hum.round_number
AND temp.x = hum.x
AND temp.y = hum.y
AND DATE_FORMAT(temp.time, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(hum.time, '%Y-%m-%d %H:%i:00')
INNER JOIN round_data_airspeed AS airspeed
ON temp.round_number = airspeed.round_number
AND temp.x = airspeed.x
AND temp.y = airspeed.y
AND DATE_FORMAT(temp.time, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(airspeed.time, '%Y-%m-%d %H:%i:00')
WHERE temp.round_id ='xxxxx'当我有大约20k行数据时,这段代码在mysql和python中的mysql连接器中都超时了。
我正在寻找优化的代码,以便它选择和插入更快的时候,我说20k +行。
发布于 2019-04-04 20:40:16
确保你有正确的索引
table round_data_temperature composite index on columns (round_id, round_number, x, y, time)
table round_data_humidity composite index on columns (round_number, x, y time)
table round_data_airspeed comoosite index on clumns (round_number, x, y,time )您必须考虑到关系中涉及的每个表只能使用一个索引的事实。如果没有对表进行索引,则会对该表进行完全扫描。
单列索引和复合索引之间的区别在于,索引中存在的所有列(从左到右)都用于过滤查询,从而避免访问表数据,因此在某些情况下,使用复合索引(正确定义)可以显着改进查询。
https://stackoverflow.com/questions/55516134
复制相似问题