Oracle alter index rebuild 与 ORA-08104 说明

浏览数：21 / 时间：2015年06月12日

一. 官网说明

在MOS 上的一篇文章讲到了rebuild online 和offline的区别：

Index Rebuild Is Hanging Or Taking Too Long [ID 272762.1]

Symptoms:
=========
       Performance issues while rebuilding very large indexes. The offline rebuilds of their index is relatively quick -finishes in 15 minutes. Issuing index rebuild ONLINE statement => finishes in about an hour. This behavior of ONLINE index rebuilds makes it a non-option for large tables as it just takes too long to scan the table to rebuild the index. The offline may not be feasible due to due to the 24/7 nature of the database. This may be a loss of functionality for such situations. If we attempt to simultaneously ONLINE rebuild the same indexes we may encounter hanging behavior indefinitely (or more than 6 hours).

DIAGNOSTIC ANALYSIS:
--------------------
       We can trace the sessions rebuilding the indexes with 10046 level 12. Comparing the IO reads for the index-rebuild and the index-rebuild-online reveals the following:
       ONLINE index rebuilds： It scans the base table and it doesn‘t scan the blocks of the index.
       OFFLINE index rebuilds：It scans the index for the build operation.

This behaviour is across all versions.

Cause/Explanation
=============
When you rebuild index online, it will do a full table scan on the base table. At the same time it will maintain a journal table for DML data, which has changed during this index rebuilding operation. So it should take longer time, specially if you do lots of DML on the same table,while rebuilding index online.
-- rebuild index online的时候，会选择全表扫描，同时会维护一个中间日志表，用来记录在rebuild 期间的增量数据，原理类似于物化视图日志，日志表是一个索引组织表(IOT)，这张中间表只有插入，不会有删除和修改操作，而且只有主键条件查询，正是IOT最合适的场景。

On the other hand, while rebuilding the index without online option, Oracle will grab the index in X-mode and rebuild a new index segment by selecting the data from the old index. So here we are not allowing any DML on the table hence there is no journal table involved and it is doing an index scan. Hence it will be pretty fast.

--rebuild offline时，选择的6模式的X 锁，它根据old index 来rebuild。因此不允许进行DML，也就没有中间表。因此也比较块。

Solution/Conclusion:
===========
- The ONLINE index rebuild reads the base table, and this is by design.
- Rebuilding index ONLINE is pretty slow.
- Rebuilding index offline is very fast, but it prevents any DML on the base table.

二. rebuild index 说明

有关锁的模式信息如下：

锁模式	锁描述	解释	SQL操作
0	none
1	NULL	空	Select
2	SS(Row-S)	行级共享锁，其他对象只能查询这些数据行	Select for update、Lock for update、Lock row share
3	SX(Row-X)	行级排它锁，在提交前不允许做DML操作	Insert、Update、 Delete、Lock row share
4	S(Share)	共享锁: 阻止其他DML操作	Create index、Lock share
5	SSX(S/Row-X)	共享行级排它锁：阻止其他事务操作	Lock share row exclusive
6	X(Exclusive)	排它锁：独立访问使用	Alter table、Drop able、Drop index、Truncate table 、Lock exclusive

锁死锁阻塞 Latch 等待详解

http://blog.csdn.net/tianlesoftware/archive/2010/08/20/5822674.aspx

DML操作一般要加两个锁，一个是对表加模式为3的TM锁，一个是对数据行的模式为6的TX锁。只要操作的不是同一行数据，是互不阻塞的。

在rebuild index online 的开始和结束阶段时，需要短暂的对表持有模式为4的TM锁的，当获取到4级别的锁之后，才降为2级。如果rebuild online一直没获取到4级别的锁,那么相关的DML全部产生等待。在执行期间只持有模式2的TM锁，不会阻塞DML操作。在Oracle 11g之后,oracle做了特殊处理,后续的dml不会被rebuild online的4级别锁阻塞.

所以如果在执行rebuild index online前长事务，并且并发量比较大，则一旦执行alter index rebuild online，可能因为长事务阻塞，可能导致系统瞬间出现大量的锁，对于压力比较大的系统，这是一个不小的风险。这是需要迅速找出导致阻塞的会话kill掉，rebuild index online一旦执行，不可轻易中断，否则可能遇到ORA-08104。

MOS 的文档：

Session Was Killed During The Rebuild Of Index ORA-08104 [ID 375856.1]

While running an online index rebuild your session was killed or otherwise terminated abnormally. You are now attempting to run the index rebuild again and is throwing the error:
ORA-08104: this index object ##### is being online built or rebuilt

关于这个错误NiGoo 同学的blog 有说明，链接如下：

http://www.ningoo.net/html/2007/dba_memo_online_rebuild_index_encounter_ora-08104.html

根据以上说明，我们可以知道在进行online rebuild 的时候，Oracle 会修改如下信息：

（1）修改ind$中索引的flags，将该flags+512. 关于这个flags的含义，在下面的实验中进行说明。

（2）在该用户下创建一个journal table 来保存在rebuild期间的增量数据。该表明名称: sys_journal_<object_id>.

如果异常结束online rebuild操作，那么oracle就没及时清理journal table和ind$的flags标志位，系统会认为online rebuild还在操作。

当然SMON 进程会来处理这些临时段。在maclean 同学(10g,11g OCM)的Blog里提到了功能：

了解你所不知道的SMON功能(一):清理临时段

http://www.oracledatabase12g.com/archives/smon-cleanup-temporary-segment.html

对于永久表空间上的temporary segment，SMON会三分钟清理一次(前提是接到post)，如果SMON过于繁忙那么可能temporary segment长期不被清理。temporary segment长期不被清理可能造成一个典型的问题是:在rebuild index online失败后，后续执行的rebuild index命令要求之前产生的temporary segment已被cleanup，如果cleanup没有完成那么就需要一直等下去。

如果SMON 不能及时清理，在操作时就会报ORA-08104的错误。

在Oracle10gR2中可以使用dbms_repair.online_index_clean手工清理这些信息，在Oracle 9i下，需要打Bug 3805539 后才可以使用该工具。

手工处理的步骤如下：

（1）先查看ind$ flags 标志，如果不正确，就减去512.

sql>update ind$ set flags=flags-512 where obj#=<object id>;

（2）drop journal table，这个步骤可能会报资源忙，因为有大量的日志正在插入，可以反复重试一下。

sql>drop table <owner>.sys_journal_<object_id>;

注意：

这个步骤不能反，如果先删除sys_journal_<object_id>临时表，然后再修改index的flags状态，则会报出ora-600 [4610]号错误，即数据字典不一致的错误。

官网关于dbms_repair.online_index_clean 的说明：

ONLINE_INDEX_CLEAN Function

This function performs a manual cleanup of failed or interrupted online index builds or rebuilds. This action is also performed periodically by SMON, regardless of user-initiated cleanup.

This function returns TRUE if all indexes specified were cleaned up and FALSE if one or more indexes could not be cleaned up.

Syntax

DBMS_REPAIR.ONLINE_INDEX_CLEAN (

object_id IN BINARY_INTEGER DEFAULT ALL_INDEX_ID,

wait_for_lock IN BINARY_INTEGER DEFAULT LOCK_WAIT)

RETURN BOOLEAN;

Parameters

Parameter	Description
object_id	Object id of index to be cleaned up. The default cleans up all object ids that qualify.
wait_for_lock	This parameter specifies whether to try getting DML locks on underlying table [[sub]partition] object. The default retries up to an internal retry limit, after which the lock get will give up. If LOCK_NOWAIT is specified, then the lock get does not retry.