mysql子查询的缺陷以及5.6的优化
分类
子查询分为from clause和where clause,在执行计划的select_type分别显示为derived和[dependent] subquery;
根据和外部查询的依赖关系又可分为两种,
相关子查询:子查询依赖外层连接的返回值;
非相关子查询:子查询不依赖外层连接的返回值;
缺陷
pre-5.6通常会将非相关子查询改为相关子查询,即先遍历outer table,对于其返回的每一条记录都执行一次subquery;
注:mysql目前只实现nested-loop join,所以dependent subquery才会如此消耗资源,如果是oracle则可进行semi/anti hash join
http://blog.itpub.net/15480802/viewspace-703260
SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE where_condition);
有可能被重写为
SELECT * FROM t1 WHERE EXISTS (SELECT t2.b FROM t2 WHERE where_condition AND t1.a=t2.b);
Pre-5.6优化
Mysql可以将from clause的子查询进行物化,此时先执行inner query并将结果存于临时表
以下是一个5.0.3版本的优化案例 http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/
select * from subcategory
where id in (
select c.id
from subcategory as c
inner join item as i on i.subcategory = c.id
where c.category = 14
group by c.id
having count(*) > 2000
);
此时mysql optimizer自作聪明的将非相关子查询改写为相关子查询,执行计划如下:
因为subquery被转化为相关子查询,即先遍历subcategory表(outer table),对于每条记录都执行一次subquery(总计300783次)
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: subcategory
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 300783
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: c
type: ref
possible_keys: PRIMARY,category
key: category
key_len: 4
ref: const
rows: 100
Extra: Using where; Using index; Using temporary; Using filesort
*************************** 3. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: i
type: ref
possible_keys: subcategory
key: subcategory
key_len: 4
ref: c.id
rows: 28
Extra: Using index
优化:采用物化子查询,所做的就是将子查询改为from clause,即添加一对括号即可;
select * from subcategory
where id in (
select id from (
select c.id
from subcategory as c
inner join item as i on i.subcategory = c.id
where c.category = 14
group by c.id
having count(*) > 2000
) as x
);
5.6优化
1 引入物化子查询(针对where clause的subquery)
5.6.5引入此功能,在此之前,优化器有时会把非关联子查询重写为相关子查询,导致效率变差;
子查询物化将子查询结果存入临时表,确保子查询只执行一次,该表不记录重复数据且采用哈希索引查找;
Optimizer_switch需设置materialization=on
2 优化derived table
以前的版本在explain时就会对from clause的subquery进行物化 ,引发了部分执行,5.6消除了这个问题;
另外,优化器可能会为derived table添加索引以加速执行
SELECT * FROM t1 JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;
优化器可以为derived_t2的f1添加索引以采用ref
http://dev.mysql.com/doc/refman/5.6/en/subquery-materialization.html
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。