一个复杂子查询SQL优化
select * from test.vmark vk where id in (select v.id from usr_center.vmark_degree_update_log v, (select min(id) id from usr_center.vmark_degree_update_log where degree_update_cause = 0 and degree_update_type = 0 group by user_id) log where v.id = log.id and v.degree_update_type = 0 and v.degree_update_before between ‘2015-01-01 00:00:00‘ and‘2015-01-10 00:00:00‘);
+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+ | 1 | PRIMARY | vk | ALL | NULL | NULL | NULL | NULL | 66051213 | Using where || 2 | DEPENDENT SUBQUERY | <derived3> | ALL | NULL | NULL | NULL | NULL | 82947 | Using where | | 2 | DEPENDENT SUBQUERY | v | eq_ref | PRIMARY,idx_dd | PRIMARY | 4 | func | 1 | Using where || 3 | DERIVED | vmar_degree_update_log | ref | idx_dd | idx_dd | 1 | | 12508106 | Using where; Using temporary; Using filesort | +----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+ 4 rows in set (8.10 sec)怎么解读这个执行计划呢?1.首先执行id=3的标识为DERIVED的步骤,这个关键词是衍生,出现在from后的子查询会有这个标识。索引idx_dd(degree_update_type,degree_update_after)将被用于这一步,ref表名这是非唯一索引扫描,预计扫描12287942行。2.执行id=2的第一条计划,也就是将第id=4的结果,group by成的82946行,对这个临时表做一个全表扫描。这里DEPENDENT SUBQUERY的意思是,这里要扫描的行数或执行次数,取决于其他步骤,即依赖于第4步。这里扫描82946行。3.得到log.id信息后,根据v.id=log.id,去驱动v表,走的是primary,eq_ref也反映这是唯一索引扫描,从rows可以看出,每执行一次返回1行,执行多少次呢,执行82946次,因为是"DEPENDENT SUBQUERY",所以执行次数或扫描行数依赖于第2步,也就是第2步每扫出一条,第3步就走一次索引。这里我们也看出来了,MySQL的执行计划并不是反映最终返回几行,不是反映这步骤总共扫描几行,也不告诉你执行多少次,而只是返回执行一次返回多少行。这里扫描82946行。4.全表扫描test.vmar表,66050840行,然后Nest loop join之前步骤返回的结果集(实际有8万多行)。这一步骤最耗时,读66050840*80000行。这里扫描行数成本是:12287942+82946*2+66050840*80000=7263409280000,7千亿行,我假设1亿行读20分钟,也需要2300多个小时,这个执行计划很恐怖。结论:放在from的子查询是非关联子查询,没关系。但是放在where后的,却要紧。问题出在第4步,test.vmark表实际上id上有主键索引,我们如果能用前面三步的结果集,获得id值,再去驱动test.vmark的id值,那么就很容易得到想要的结果。更改SQL如下,将in转变成join。select * from test.vmar vk join (select v.id from usr_center.vmar_degree_update_log v, (select min(id) id from usr_center.vmar_degree_update_log where degree_update_cause = 0 and degree_update_type = 0 group by user_id) log where v.id = log.id and v.degree_update_type = 0 and v.degree_update_before between ‘2015-01-01 00:00:00‘ and ‘2015-01-10 00:00:00‘) child where vk.id = child.id;这个执行计划的解释,前面几步,也就是对应这个执行计划的第3,4,5行都比较像,就不再累述。得到的结果集有id值,第一行的执行计划就读这个结果集,有82371行,读出的每一行取出id值去驱动test.vmark表的主键id。+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 82371 | | | 1 | PRIMARY | vk | eq_ref | PRIMARY | PRIMARY | 4 | child.id | 1 | | | 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 82948 | | | 2 | DERIVED | v | eq_ref | PRIMARY,idx_dd | PRIMARY | 4 | log.id | 1 | Using where | | 3 | DERIVED | vmar_degree_update_log | ref | idx_dd | idx_dd | 1 | | 11901368 | Using where; Using temporary; Using filesort | +----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+这里读取行数的成本是:11901368+82948*2+82371*2=48102006行。
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。