Exadata new bug with 11.2.0.2.x

New bug was discovered with EXADATA 11.2.2.4.2 in regards to OFA infiniband drivers…details to come soon…

 

Trace file  contents:

System state dump requested by (instance=1, osid=31982), summary=[SYSTEMSTATE_GLOBAL: global system state dump request (kjdgdss_g)].

[ DISKMON][30223] dskm_dump_all()+281 call kgdsdst() 000000000 ? 000000000 ?
[ DISKMON][30223] dskm_async_handler( call dskm_dump_all() 000000000 ? 000000000 ?
[ DISKMON][30223] __sighandler() call dskm_async_handler( 000000000 ? 000000000 ?
[ DISKMON][30223] __poll()+102 signal __sighandler() 7FFFF0F05598 ? 000000001 ?
[ DISKMON][30223] skgznp_accept()+120 call __poll() 7FFFF0F05598 ? 000000001 ?
[ DISKMON][30223] dskm_main()+3052 call skgznp_accept() 01DDBF320 ? 2AAAAC022910 ?
[ DISKMON][30223] __do_global_ctors_a call dskm_main() 00000DDEB ? 00000001D ?
[ DISKMON][30223] __libc_start_main() call __do_global_ctors_a 00000DDEB ? 00000001D ?

AWR automatic or manual snapshot hangs – EXADATA

Custer database manual or auto snapshot hangs and never returns:

Bug info:

https://support.oracle.com/CSP/ui/flash.html#tab=KBHome%28page=KBHome&id=%28%29%29,%28page=KBNavigator&id=%28bmDocType=BUG&bmDocTitle=AWR%20SNAPSHOTS%20HANGING&from=BOOKMARK&bmDocDsrc=BUG&bmDocID=13372759&viewingMode=1143%29%29

Solution:

sqlplus / as sysdba

SQL> exec dbms_stats.gather_table_stats('SYS','X$KCCFN',no_invalidate=>false);

SQL> exec dbms_stats.gather_table_stats('SYS','X$KCCFE',no_invalidate=>false);

SQL> execute dbms_workload_repository.modify_snapshot_settings (interval => 15, retention => 1576800);
SQL> EXEC DBMS_WORKLOAD_REPOSITORY.create_snapshot;

Step above should fix the issue..

Exadata bug related to HAIP

Recently i have been engaged Peoplesoft financials and HR Exadata gig.  I ran into issue related to HAIP that is used on EXADATA system.  Oracle have confirmed that it is bug and solution is to set cluster_interconnects for each node.

following error encountered when starting cluster database:

PRCR-1013 : Failed to start resource ora.orcl.db
PRCR-1064 : Failed to start resource ora.orcl.db on node nd01db1
CRS-5017: The resource action “ora.orcl.db start” encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
 

Trace file:

Group reconfiguration cleanup

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

*** 2012-02-07 15:05:09.686

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

*** 2012-02-07 15:05:10.714

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

*** 2012-02-07 15:05:11.743

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

*** 2012-02-07 15:05:12.565

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

*** 2012-02-07 15:05:16.275

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE])

I looked up on support and they are saying it is a bug:

https://support.oracle.com/CSP/ui/flash.html#tab=KBHome(page=KBHome&id=()),(page=KBNavigator&id=(bmDocTitle=Exadata%20Rac%20Node%20Instance%20Crash%20with%20kjzdattdlm:%20Can%20not%20attach%20to%20DLM&from=BOOKMARK&bmDocType=PROBLEM&bmDocDsrc=KB&bmDocID=1386843.1&viewingMode=1143))

solution is to shutdown and recreate spfile with hard coded value for bindib0

SQL> create pfile='inittempORCL.ora' from spfile ;
SQL> !echo "ORCL1.cluster_interconnects='192.168.5.1' " >>inittempORCL.ora
SQL> !echo "ORCL2.cluster_interconnects='192.168.5.2' " >>inittempORCL.ora

SQL> create spfile='+DATA/ORCL/spfileORCL.ora' from pfile='inittempORCL.ora';

# srvctl start database -d ORCL

Repeat each steps all cluster databases.

hopefully, steps above should fix the issue with bindib0.