Wednesday, July 20, 2011

RMAN 10gR2 : Performing Incomplete Database Recovery (Full Disaster Scenario)


Scenario :
You have lost your Disks and You have only the last Backup and you do not use a Recovery Catalog.

Solution :
Perfrom Incomplete recovery until the time the last backup fiished
and roll forward to the last availiable archive log in the backup area.

Tips:

1) Always register your DBID

rman target /

Recovery Manager: Release 10.2.0.5.0 - Production on Wed Jul 20 08:36:48 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: MYDB (DBID=3255826720)


2) Always keep in to a file the show all command of RMAN in order to know the the controlfile autobackup format.

$ rman target / msglog rman.out
  RMAN> show all;
  RMAN> exit
$ vi rman.out

3) If you use a Block Change Tracking File (BCF) , after you mount the DB disable the use.
RMAN> sql 'alter database disable block change tracking';

4) You may need to recreat your password file again (Specially in a RAC database if its in the DB Area)
For Signle Instance Database
orapwd file=orapwmydb password=mypass
For RAC Database
orapwd file=/common_location/orapwmydb password=mypass

Recovery (The test has been made on a RAC DB , but its ok for single instance too):

On one node :

$export ORACLE_HOME=/MY_ORA_HOME_DIR
$export ORACLE_SID=MYDB1
$rman target /
RMAN>startup force nomount;
RMAN>set dbid=3255826720;
RMAN>run {
set controlfile autobackup format for device type disk to '/orabackup/mydb/%F';
restore spfile from autobackup;
}
RMAN>startup force nomount;
RMAN>run {
set controlfile autobackup format for device type disk to '/orabackup/mydb/%F';
restore controlfile from autobackup;
alter database mount;
}
RMAN>restore database;
Starting restore at 20-JUL-11
using channel ORA_DISK_1
using channel ORA_DISK_2
using channel ORA_DISK_3

.......

Finished restore at 20-JUL-11

RMAN> recover database;

Starting recover at 20-JUL-11
using channel ORA_DISK_1
using channel ORA_DISK_2
using channel ORA_DISK_3

starting media recovery

.....

unable to find archive log
archive log thread=1 sequence=14
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/20/2011 08:18:26
RMAN-06054: media recovery requesting unknown log: thread 1 seq 14 lowscn 4955096

# Since you do incomplete recovery ignore the error

RMAN> alter database open resetlogs;

database opened

RMAN>exit;

Recovery Manager complete.

Restart Your database , start any service, dbconsole/agent and take an immediate full backup.

Monday, July 18, 2011

Restoring Voting Disk and OCR files from backup 10gR2


Use the info from the following post to identify or take a backup of the Voting Disk (CSS File) and OCR:

http://agstamy.blogspot.com/2009/05/backing-up-css-and-ocr-files.html

As oracle user from one node :
$export ORACLE_HOME=DB_HOME
$srvctl stop database -d db_name
$svrctl stop asm -n node_name
$srvctl stop nodeapps -n node_name

On each node:
$emctl stop dbconsole
$export ORACLE_HOME=ORA_CRS_HOME_DIR
$su
#crsctl stop crs

From one node :

For the voting disk oracle do:

Copy the backup to the voting diks locations and set the correct permissions
Use dd if your are in raw devices or simply cp if you are using cluster file system.

e.g

dd if=your_backup_file of=/dev/raw/css

cp -p my_css_backup_file /my_css_mount_point/my_css_backup_file


For the OCR File as root do :
If you are on cluster filesystem then you must create an empty file first with correct permissions
touch /ocr_location_mount/ocr_filename
chown root.oinstall /ocr_location_mount/ocr_filename

Then restore from last system backup or use your latest dumpfile
ocrconfig -restore backup00.ocr
backup00.ocr : Is the last auto backup on $ORA_CRS_HOME/cdata/[cluster_name] the default autobackup configuration location on local disks on all nodes

ocrconfig -import /your_ocr_last_dump_file

Start the crs :

crsctl start crs

Watch to see if all cluster servcies are up and functional !!!

Things to notice : 
If you try this for full cluster restore then you must remember the directories ownerhips
For the css file the owner is oracle and for the ocr is root.

References :
Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide
10g Release 2 (10.2), Chapter 3 : Administering Oracle Clusterware Components.

Metalink Note IDs: 1092293.1, 953914.1

Friday, July 15, 2011

OCFS2 Guide Lines for 10gR2 CRS and RAC on Linux x86_64


If you plan to install Oracle RAC on Linux using OCFS2 for CRS files and RAC DB files the follow the recommendations for the parameter mentioned , Recommendations are for Version 1.4.7

When Creating the OCFS2 Cluster

O2CB_HEARTBEAT_THRESHOLD

It should be set to the timeout value of the io layer. Most multipath solutions have a timeout ranging from 60 secs to 120 secs.
For 60 secs, set it to 31. For 120 secs, set it to 61 according to the follwoing formula : O2CB_HEARTBEAT_THRESHOLD = (((timeout in secs) / 2) + 1)
In general a value of 61 is suitable in most cases

Change the value in the vi /etc/sysconfig/o2cb on all nodes of the OCFS2 Cluster and restart the cluster on all nodes.

When formatting an OCFS2 filesystem.

Cluster Size:

The cluster size is the smallest unit of space allocated to a file to hold  file data.
Permissible cluster sizes include; 4, 8, 16, 32, 64, 128, 256, 512, and 1024Kb.
Oracle recommends a cluster size of 128 KB or higher for datafiles and a cluster size of 32 or 64Kb for Oracle Home volumes or for OCRF and CSS Files.

ATTENTION : The cluster size cannot be modified after OCFS2 volume format.

Block Size:

The smallest unit of space addressable by the filesystem.
Permissible block sizes include; 512 bytes (not recommended), 1Kb, 2Kb, or 4Kb (recommended for most volumes).
4Kb Block Size works fine and Oracle Database Block should be equal or multiple to this 8K , 16K etc.

ATTENTION : The block size cannot be modified after OCFS2 volumes format.


When mounting an OCFS2 filesystem

datavolume:

Use this OCFS2 mount option to mount volumes storing Oracle datafiles, control files, redologs, archivelogs, Oracle clusterware files (voting disk and cluster registry), etc.
The datavolume mount option ensures the Oracle database processes open datafiles with the o_direct flag, required by Oracle Real Application Clusters (RAC) to concurrently access shared files.

noatime:

Users should add the "noatime" mount option to the mounts that hold the Oracle datafiles, redologs, archivelogs, voting file, etc. This is because OCFS2 1.4 supports atime (access time) which makes it update inodes even on a read. Like ls -l. As tracking access time is not useful for the database, users are advised to disable it for such mounts.


References :
OCFS2 1.4.7-1 Release Notes (Metalink Note ID 1086231.1)
OCFS2 1.2 - FREQUENTLY ASKED QUESTIONS
Meatalink Note ID : 727866.1

Thursday, July 14, 2011

Oracle 10gR2 : Truncating a table causes "Enq: RO - FAST OBJECT REUSE" and DBW High CPU Usage

On Oracle 10gR2 (10.2.0.4) a table truncate is taking lots of minutes and your session si waiting on  "Enq: RO - FAST OBJECT REUSE" then you probably have hit bug8544896 introduced in 10.2.0.4 and it affects both Single instance and RAC environments, and a patch is available.

In order to confirm the bug as sys do :

select waiting_session,holding_session,lock_type,mode_held,mode_requested
from dba_waiters;


select sid, serial#,program from v$session where sid = 1095;

       SID    SERIAL# PROGRAM
---------- ---------- ------------------------------------------------
      1095          1 oracle@castor (CKPT)

If you see that the CKPT Process is then that's the case.


Workaround
1) From an other session as sys issue :
    alter system flush buffer_cache;
2) Set _db_fast_obj_truncate = FALSE to spfile/init.ora

References :
Metalink Note id : 888844.1
http://www.ora-solutions.net/web/2009/01/20/session-waiting-for-enq-ro-fast-object-reuse-dbwr-process-spinning-on-cpu/


Wednesday, July 6, 2011

Install 10gR2 RAC (CRS,DATABASE) on CentOS Enterprise / RHEL / Oracle Enterprise Linux x86_64 5.x


For the CRS and OCR files we will use OCFS2 and ASM for DATA and FRA

on node1

[root@node01 ~]# id nobody
uid=99(nobody) gid=99(nobody) groups=99(nobody)

[root@node02 ~]# groupadd -g 501 oinstall
[root@node02 ~]# groupadd -g 502 dba
[root@node02 ~]# useradd -u 501 -g oinstall -G dba oracle
[root@node02 ~]# passwd oracle

[root@node01 ~]# id oracle
uid=501(oracle) gid=501(oinstall) groups=501(oinstall),502(dba)

on node2

[root@node02 ~]# groupadd -g 501 oinstall
[root@node02 ~]# groupadd -g 502 dba
[root@node02 ~]# useradd -u 501 -g oinstall -G dba oracle
[root@node02 ~]# passwd oracle

You must use the same password also for convinience.

Check date on both nodes and guide you sysadmins to install ntp

Configure ssh equivelance

on both nodes
which scp if not in /usr/local/bin create link : cd /usr/local/bin ; ln -s /usr/bin/scp scp
which ssh if not in /usr/local/bin create link : cd /usr/local/bin ; ln -s /usr/bin/ssh ssh

On both nodes login as oracle
 mkdir .ssh
 chmod 700 .ssh
 ssh-keygen -t rsa
 ssh-keygen -t dsa

on node01 as oracle:
cd .ssh
ssh node01 cat /home/oracle/.ssh/id_rsa.pub >> authorized_keys
ssh node01 cat /home/oracle/.ssh/id_dsa.pub >> authorized_keys
ssh node02 cat /home/oracle/.ssh/id_rsa.pub >> authorized_keys
ssh node02 cat /home/oracle/.ssh/id_dsa.pub >> authorized_keys
scp authorized_keys node02:/home/oracle/.ssh/

on node02 as oracle:
cd .ssh
ssh node01 cat /home/oracle/.ssh/id_rsa.pub >> authorized_keys
ssh node01 cat /home/oracle/.ssh/id_dsa.pub >> authorized_keys
ssh node02 cat /home/oracle/.ssh/id_rsa.pub >> authorized_keys
ssh node02 cat /home/oracle/.ssh/id_dsa.pub >> authorized_keys
scp authorized_keys node01:/home/oracle/.ssh/

On each node
chmod 600 ~/.ssh/authorized_keys
exec /usr/bin/ssh-agent $SHELL
/usr/bin/ssh-add

vi /home/oracle/.ssh/config ... add
Host *
ForwardX11 no

vi .bashrc ... add
if [ -t 0 ]; then
stty intr ^C
fi

vi .bash_profile
umask 022
export DISPLAY=X.X.X.X:0.0

. ./.bash_profile


Check ssh equiveleance

From node01 as oracle
ssh node01 date
ssh node02 date
from node02 date
ssh node01 date
ssh node02 date

On both commands password must not be asked if all ok.

As root on each node
Check if SELinux is disabled otherwise disable it.
# /usr/sbin/getenforce
If returns Disabled you are ok !!!

Install required packages !!!
Check also : http://agstamy.blogspot.com/2010/11/installation-of-oracle-10gr2-database.html

rpm -qa | grep [PACKAGE NAME] to check
yum install [PACKAGE NAME] to install

Pakcage List according to Metalink note : 421308.1 Requirements For Installing Oracle10gR2 On RHEL 5/OEL 5 (x86_64)
binutils-2.17.50 (x86_64)
compat-db-4.2.52 (x86_64)
compat-libstdc++-296(i386)
compat-libstdc++-33-3.2.3 (x86_64)
compat-libstdc++-33-3.2.3(i386)
control-center-2.16.0 (x86_64)
gcc-4.1.1 (x86_64)
gcc-c++-4.1.1 (x86_64)
glibc-2.5-12 (x86_64)
glibc-2.5-12 (i386)
glibc-common-2.5-12 (x86_64)
glibc-devel-2.5-12 (x86_64)
glibc-devel-2.5-12(i386)
glibc-headers-2.5-12 (x86_64)
ksh-20060214-1.4 (x86_64)
libaio-0.3.96 (x86_64)
libaio-devel-0.3.106-3.2
libgcc-4.1.1(i386)
libgcc-4.1.1(x86_64)
libgnome-2.16.0 (x86_64)
libgnomeui-2.16.0 (x86_64)
libgomp-4.1.1 (x86_64)
libstdc++-4.1.1 (x86_64)
libstdc++-devel-4.1.1 (x86_64)
libXp-1.0.0-8 (i386)
make-3.81 (x86_64)
sysstat-7.0.0 (x86_64)

Add virtual and private ips on /etc/hosts on each node (The hosts file must have the same format as the example)

[root@node01 bin]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost localhost.localdomain  
10.9.135.120    node01 node01.internal.upstreamsystems.com  
10.9.135.121    node02 node02.internal.upstreamsystems.com  
10.9.135.122    node01-vip node01-vip.internal.upstreamsystems.com      
10.9.135.123    node02-vip node02-vip.internal.upstreamsystems.com      
10.9.222.120    node01-priv
10.9.222.121    node02-priv

Also hostname in /etc/sysconfig/network must be without the domain.
cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=node01
GATEWAY=10.9.135.8

vi /etc/hosts.equiv
node01 oracle
node02 oracle
node01-priv oracle
node02-priv oracle

Check if firewall is enabled and if is stop it
/etc/rc.d/init.d/iptables status
Firewall is stopped.
/etc/rc.d/init.d/iptables stop
chkconfig iptables off

Set the kernel parameters
Add the following the lines in the file /etc/sysctl.conf

kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65500
net.core.rmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_default = 262144
net.core.wmem_max = 262144
net.ipv4.tcp_wmem = 262144 262144 262144
net.ipv4.tcp_rmem = 4194304 4194304 4194304

and run sysctl -p

Set the session limits for Oracle user

Add the following lines to the /etc/security/limits.conf file

oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536

# As per Release Notes
oracle soft memlock 3145728
oracle hard memlock 3145728

Add the following line in the /etc/pam.d/login file
session required pam_limits.so

Add the following lines to the /etc/profile.
This assumes that oracle user is using bash shell which is given by default.
Change this accordingly if the oracle user has a different shell

if [ $USER = "oracle" ]; then
  ulimit -u 16384
  ulimit -n 65536
fi

Create Oracle User base directories and crs home directories

# mkdir -p /u01/app/oracle
# chown -R oracle:oinstall /u01/app/oracle
# chmod -R 775 /u01/app/oracle

# mkdir -p /u01/app/oracle/crs
# chown -R oracle:oinstall /u01/app/oracle/crs
# chmod -R 775 /u01/app/oracle/crs

Configure hangcheck timer
lsmod | grep hang
if not hangcheck_timer configured then
insmod /lib/modules/[Kernel Version]/kernel/drivers/char/hangcheck-timer.ko hangcheck_tick=30 hangcheck_margin=180
add the previous comand in /etc/rc.d/rc.local

Configure partitions on SAN LUNS to be used as crs,voting and ASM Disks
We have the above LUNs
/dev/sde (LUN3): 2G for CRS
/dev/sdf (LUN4): 2G for VOTING
/dev/sdg (LUN5): 60G for DATA
/dev/sdh (LUN6): 30G for FRA

The following procedure must be used for each device

[root@node01 app]# fdisk /dev/sde
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1009, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-1009, default 1009): 1009

Command (m for help): p

Disk /dev/sde: 2147 MB, 2147483648 bytes
67 heads, 62 sectors/track, 1009 cylinders
Units = cylinders of 4154 * 512 = 2126848 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1        1009     2095662   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

At the end run partprobe on each node

Install & Configure Oracle Cluster File System (OCFS2)

Check if ocfs2 is allready installed
rpm -qa | grep ocfs2 | sort

go to http://oss.oracle.com/
and download
OCFS2 Kernel Driver,OCFS2 tools,OCFS2 console for your kernel version (You do not need the kernel-debug or xen module)
And install them on both nodes
rpm -Uhv ocfs2*

on both Oracle RAC nodes
export DISPLAY=10.9.13.252:0
ocfs2console
And do the cluster configuration to create /etc/ocfs2/cluster.conf
on both nodes must be the same and looks like :

node:
        ip_port = 7777
        ip_address = 10.9.135.120
        number = 0
        name = node01
        cluster = ocfs2

node:
        ip_port = 7777
        ip_address = 10.9.135.121
        number = 1
        name = node02
        cluster = ocfs2

cluster:
        node_count = 2
        name = ocfs2

Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold to 61 on both nodes
# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
# /etc/init.d/o2cb status  to check status !!!

Create directories for Oracle CRS and Votting file
On both nodes
# mkdir -p /u02/oradata/css
# chown -R oracle:oinstall /u02/oradata/css
# chmod -R 775 /u02/oradata/css
mkdir -p /u02/oradata/ocr
chown -R oracle:oinstall /u02/oradata/ocr
chmod -R 775 /u02/oradata/ocr

On first node create filesystems fromated with ocfs2
mkfs.ocfs2 -b 4K -C 32K -N 4 -L cssfiles /dev/sde1
mkfs.ocfs2 -b 4K -C 32K -N 4 -L ocrfiles /dev/sdf1

on each node
Mount the volumes
mount -t ocfs2 -o datavolume,nointr -L "cssfiles" /u02/oradata/css
mount -t ocfs2 -o datavolume,nointr -L "ocrfiles" /u02/oradata/ocr

Add entries to /etc/fstab for auto mount on reboot
LABEL=cssfiles          /u02/oradata/css        ocfs2   _netdev,datavolume,nointr  0 0
LABEL=ocrfiles          /u02/oradata/ocr        ocfs2   _netdev,datavolume,nointr  0 0

Check the permisions of the mount points
# ls -ld /u02/oradata/css
drwxr-xr-x 3 root root 4096 Feb 12 13:17 /u02/oradata/css
# chown oracle:oinstall /u02/oradata/css
# chmod 775 /u02/oradata/css
# ls -ld /u02/oradata/css
drwxrwxr-x 3 oracle oinstall 4096 Feb 12 13:17 /u02/oradata/css
# ls -ld /u02/oradata/ocr
drwxr-xr-x 3 root root 4096 Feb 12 13:18 /u02/oradata/ocr
# chown oracle:oinstall /u02/oradata/ocr
# chmod 775 /u02/oradata/ocr
# ls -ld /u02/oradata/ocr
drwxrwxr-x 3 oracle oinstall 4096 Feb 12 13:18 /u02/oradata/ocr

Check if the o2cb is configured to start on reboot 2,3,4,5 must be on

chkconfig --list o2cb
o2cb            0:off   1:off   2:on    3:on    4:on    5:on    6:off

REBOOT BOTH NODES !!!!!

Install & Configure Automatic Storage Management (ASMLib 2.0)
ASMLib Kernel Driver , Userspace Library , Driver Support Files
Go to http://www.oracle.com/technology/software/tech/linux/asmlib/index.html
Check if ASM Lib is allready installed

On both nodes
rpm -qa | grep oracleasm | sort
Install them
rpm -Uvh oracleasm*
Configure asmlib
/etc/init.d/oracleasm configure

Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting <ENTER> without typing an
answer will keep that current value.  Ctrl-C will abort.

Default user to own the driver interface []: oracle
Default group to own the driver interface []: oinstall
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done
Initializing the Oracle ASMLib driver:                     [  OK  ]
Scanning the system for Oracle ASMLib disks:               [  OK  ]

From the first node
# /etc/init.d/oracleasm listdisks
If there are disk availiable delete them
# /etc/init.d/oracleasm createdisk VOL1 /dev/sdg1
Marking disk "VOL1" as an ASM disk:                        [  OK  ]
# /etc/init.d/oracleasm createdisk VOL2 /dev/sdh1
Marking disk "VOL2" as an ASM disk:                        [  OK  ]

On the second node
#/etc/init.d/oracleasm scandisks

Verify on both nodes that you see the disks

Now begin CRS Installation

Install cvuqdisk
as root on both nodes
on first node
cd /home/oracle/clusterware/rpm/
rpm -ivh cvuqdisk-1.0.1-1.rpm
Move rpm to second node
scp cvuqdisk-1.0.1-1.rpm node02:/root
rpm -ivh cvuqdisk-1.0.1-1.rpm

As oracle run
cd ~/clusterware/cluvfy
./runcluvfy.sh stage -pre crsinst -n node01,node02

./runcluvfy.sh stage -post hwos -n node01,node02
To check for shared storage !!!

Install crs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/app/oracle/crs
cd clustware
./runInstaller -ignoreSysPrereqs

as root first on node1 and then on node2 do:
/u01/app/oracle/oraInventory/orainstRoot.sh

Before run root.sh
Apply workaround #2 for vipca errors as per Metalink Note:414163.1 (maybe not applying on ppc)
as root first on node1 and then on node2 do:
vi /u01/app/crs/bin/vipca
Find lines :
if [ "$arch" = "i686" -o "$arch" = "ia64" -o "$arch" = "x86_64" ]
then
  LD_ASSUME_KERNEL=2.4.19
  export LD_ASSUME_KERNEL
fi

unset LD_ASSUME_KERNEL         <<== Line to be added

run root.sh on first node and then on second node
and if you see
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
do
<CRS_HOME>/bin # ./oifcfg setif -global eth0/192.168.1.0:public
<CRS_HOME>/bin # ./oifcfg setif -global eth1/10.10.10.0:cluster_interconnect
<CRS_HOME>/bin # ./oifcfg getif
 eth0 192.168.1.0 global public
 eth1 10.10.10.0 global cluster_interconnect

The goal is to get the output of "oifcfg getif" to include both public and cluster_interconnect interfaces, of course you should exchange your own IP addresses and interface name from your environment. To get the proper IPs in your environment run this command:

<CRS_HOME>/bin # ./oifcfg iflist
eth0 192.168.1.0
eth1 10.10.10.0

Run vipca manually from failed node
/u01/app/oracle/oracle/product/10.2.0/racdb

WARNING !!!!
If you use vlans then avoid to have interfaces with names like : eth0.100
Try to use names like vlan100.
Use the following configuration for the ifcfg-[interface] script

VLAN=yes
VLAN_NAME_TYPE=VLAN_PLUS_VID_NO_PAD
DEVICE=vlan100
PHYSDEV=eth0
BOOTPROTO=static
ONBOOT=yes
TYPE=Ethernet
IPADDR=10.254.5.254
NETMASK=255.255.255.0

Do these changes or oifcfg getif will not return the interfaces and vipca never run !!!

Create a file for crs environment on both nodes
e.g .crsenv
unset ORACLE_BASE
unset ORACLE_HOME
unset ORACLE_SID
unset PATH
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/app/crs
export PATH=$ORACLE_HOME/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

Patch crs to the latest patch set. CRS must always have the same or higher patch level than the oracle database home.
Then install Oracle Software using software only and selecting all Cluster Nodes.
Patch the new oracle home to the latest patch set and then
1) Create the ASM Instance and Configrue the Disk Groups
2) Create your RAC database and choose to manage it with dbconsole.

References

Metalink Notes : 414163.1,421308.1

Sunday, July 3, 2011

Oracle 10gR2 CRS diagwait to avoid OPROCD node evictions

If your CRS is from 10.2.0.3 and up is a must to apply the following on each node on our cluster.


  1. Execute as root
    #crsctl stop crs
    #<CRS_HOME>/bin/oprocd stop
  2. Ensure that Clusterware stack is down on all nodes by executing
    #ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
    This should return no processes. If there are clusterware processes running and you proceed to the next step, you will corrupt your OCR. Do not continue until the clusterware processes are down on all the nodes of the cluster.
  3. From one node of the cluster, change the value of the "diagwait" parameter to 13 seconds by issuing the command as root:
    #crsctl set css diagwait 13 -force
  4. Check if diagwait is set successfully by executing. the following command. The command should return 13. If diagwait is not set, the following message will be returned "Configuration parameter diagwait is not defined"
    #crsctl get css diagwait
  5. Restart the Oracle Clusterware on all the nodes by executing:
    #crsctl start crs
  6. Validate that the node is running by executing:
    #crsctl check crs


The above procedure had set the diagwait parameter to 13. This parameter according to Oracle gives an additional time of 10 secs to flush the necessary CRS diagnostics on the disk during a node eviction. From my experience it resolves node evictions under heavy load (Specially in AIX) because changes the OPROCD default values that affect scheduling latencies. In most circumstances OPROCD default values can be overly very sensitive , especially under heavy loaded cluster and Oracle stated "To overcome these scheduling latencies, Oracle recommends that you set the Oracle Clusterware parameter DIAGWAIT to the value 13." on note id 567730.1

Personally i set this parameter in every Oracle Clusterware 10gR2 (10.2.0.3 and up) installation.
It does not apply in 11g since the logic has change.

References

Oracle Metalink Notes: 559365.1 ,567730.1