高并发压力下导致数据库bug

环境信息:  linux 6.1 + oracle11.2.0.3 RAC
 
 
问题现象:
学校晚上6点选课,人数大概有3000,7点时,数据库报错如下(数据库到6点多还是可以连接的),数据库hung住了。

Tue Dec 16 18:00:33 2014
Dumping diagnostic data in directory=[cdmp_20141216180033], requested by (instance=2, osid=24917 (M001)), summary=[incident=45211].
Tue Dec 16 18:02:23 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:03:23 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:04:54 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:05:55 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:07:26 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:08:26 2014
PMON failed to acquire latch, see PMON dump
.........................................
.........................................
Tue Dec 16 19:31:32 2014
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x9379D18, kglic0()+1086] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/trace/ywkdb2_m003_28984.trc  (incident=44523):
ORA-07445: exception encountered: core dump [kglic0()+1086] [SIGSEGV] [ADDR:0x18] [PC:0x9379D18] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/incident/incdir_44523/ywkdb2_m003_28984_i44523.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

Tue Dec 16 20:00:23 2014
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x9379D18, kglic0()+1086] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/trace/ywkdb2_m000_30227.trc  (incident=44811):
ORA-07445: exception encountered: core dump [kglic0()+1086] [SIGSEGV] [ADDR:0x18] [PC:0x9379D18] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/incident/incdir_44811/ywkdb2_m000_30227_i44811.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.


最后排查,是数据库该版本的一个bug。
 
详见:
 
 
 
排查过程:
 
 根据ORA-07445提报的错误,排查oracle官方文档,发现在官方文档中标明这是一个oracle bug;文档中说明,在以下条件下可触发该bug:

  这几点,目前数据库状态都符合:
   1)。目前数据库版本是11.2.0.3
   2)。报错提示是ORA-07445: [kglic0()]
   3)。在trc文件中,有kksIterCursorStat提示
           技术分享
    
   4)。_kghdsidx_count值>1
         技术分享

   5)。查询select count(*) from v$sql_bind_capture where name like ‘:SYS%‘;总数>0

         技术分享

   根据官方文档的处理意见,修改隐含参数"_kghdsidx_count"=1,重启数据库服务可以解决此问题。
      

 另外根据官方文档中的延伸说明:Bug 12340939  ORA-7445 [kglic0] can occur capturing cursor stats for V$SQLSTATS也可能会触发该问题,修改隐含参数
"_cursor_stats_enabled"=FALSE,重启数据库服务可以解决此问题。

 技术分享
 
修改后,学校第二天选课没有再报错了。
 
 
案例总结:
 
1.有问题先查MOS
 
2.具体到这个问题,之前没有出现这个问题,选课时出来了,说明一点:在大数据高并发情况下,这个bug显现了。
 
 
导致问题的原因:
 
During SQL statistics analysis for query execution, we load dependency information into the Library Cache.  These dependency memory pieces can be overwritten by other sessions without causing an error.  When the original session runs into the wrong dependency information it can lead to the internal error and performance issues during subsequent queries of these Library Cache objects.
 
---简而言之:Library Cache由于并发压力,历史信息找不到了
 
或者
 

During SQL statistics analysis for query execution, we load dependency information into the Library Cache.  These dependency memory pieces can be overwritten by other sessions without causing an error.  When the original session runs into the wrong dependency information it can lead to the internal error and performance issues during subsequent queries of these Library Cache objects.
 
---简而言之:表的统计信息有问题
 
 

郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。