sql 中in、exists在驱动表选择上的执行效率测试分析
都说exits快,怎么快?哪快?为啥快?
1 环境准备
环境为oracle 12c db
1.只查询需要的列,不用*
2.IN适合于外表大而内表小的情况;EXISTS适合于外表小而内表大的情况
create table a as select * from all_objects;
insert into a select * from a;
/
/
/
commit;
create table b as select * from all_objects whererownum < 500;
SQL> select count(*) from a;
COUNT(*)
----------
2850880
SQL> select count(*) from b;
COUNT(*)
----------
499
SQL>
2 大小表在in中驱动顺序不同执行时间差异
2.1 大表A表为驱动表(加hint结果)
SQL> select /*+ use_hash(a,b) leading(a) */ count(*) from a where a.object_name in (select object_name from b);
COUNT(*)
----------
15968
1 row selected.
Elapsed: 00:00:53.22
Execution Plan
----------------------------------------------------------
Plan hash value: 1819916167
------------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECTSTATEMENT | | 1 | 91 | | 576 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 91 | | | |
|* 2 | HASH JOIN SEMI | | 2 | 182 | 3224K| 576 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 89090 | 2175K| | 417 (1)| 00:00:01 |
| 4 | TABLE ACCESS FULL| B | 1 | 66 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -access("A"."OBJECT_NAME"="OBJECT_NAME")
--大表A作为驱动表,执行时间Elapsed:00:00:53.22
2.2 小表B作为驱动表(不加hint,默认计划)
SQL> select count(*) from a where a.object_name in (selectobject_name from b);
COUNT(*)
----------
15968
1 row selected.
Elapsed: 00:00:01.81
Execution Plan
----------------------------------------------------------
Plan hash value: 894329177
------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECTSTATEMENT | | 1 | 91 | 419 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 91 | | |
|* 2 | HASH JOIN RIGHT SEMI| | 2 | 182 | 419 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL | B | 1 | 66 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | A | 89090 | 2175K| 417 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -access("A"."OBJECT_NAME"="OBJECT_NAME")
--小表B作为驱动表,执行时间00:00:01.81
结论:可以看到在使用in的情况下,使用小表B作为驱动表,效果明显,执行时间为1.81s,而大表A作为驱动表,却执行53s,相差甚大
3 大小表在exists中驱动顺序不同执行时间差异
3.1 大表A驱动
SQL> select /*+ use_hash(a,b) leading(a) */count(*) from a where exists(select 1 from b where a.object_name=b.object_name);
COUNT(*)
----------
15968
1 row selected.
Elapsed: 00:00:02.47
Execution Plan
----------------------------------------------------------
Plan hash value: 1819916167
------------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECTSTATEMENT | | 1 | 39 | | 17998 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 39 | | | |
|* 2 | HASH JOIN SEMI | | 27474 | 1046K| 100M| 17998 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| A | 2850K| 67M| | 12992 (1)| 00:00:01 |
| 4 | TABLE ACCESS FULL| B | 499 | 6986 | | 5 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
3.2 小表B驱动
SQL> select count(*) from a where exists(select 1 from b wherea.object_name=b.object_name);
COUNT(*)
----------
15968
1 row selected.
Elapsed: 00:00:00.29
Execution Plan
----------------------------------------------------------
Plan hash value: 894329177
------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECTSTATEMENT | | 1 | 39 | 13006 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 39 | | |
|* 2 | HASH JOIN RIGHT SEMI| | 27474 | 1046K| 13006 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL | B | 499 | 6986 | 5 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | A | 2850K| 67M| 12992 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
--还是发现小表作为驱动表快,小表为驱动29毫秒,大表A为驱动表,执行时间2.47秒
结论:
1 发现不管用大表还是小表还是大表作为驱动表,exists执行效率都比in要高。
2 不管使用exits还是in,oracle优化器默认都使用了hash join进行了连接处理。
3 不管使用exits还是in,经测试证明小表作为驱动表,效率更优
4 oracle优化器默认使用小表作为驱动表,说明优化器判断准确,符合预期
exists为什么效率比in高
1 可以看到测试中的结果都使用了hash join,相当于在连接键上创建了一个索引,加快了处理速度,in语句也同样获益。
2 可以发现in后的结果集要比exits后的结果集大,这是导致exists效率高的一个主要原因,本例例外(b表完全是a表的子集)。
3 in后面的结果集需要首先查询计算出来,然后进行散列连接,最后执行主查询,时间长
exists后面的结果集直接实现了散列连接,和主查询同步进行连接,存在就返回相应数据,处理步骤少,处理时间短,再结合第二条处理比in节省资源。
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。