Tuesday, January 19, 2010

ORA-27061: waiting for async I/Os failed on AIX

When you see this error and the database crashes , hangs or does not shutdown/startup then there is a problem with Async I/O servers on AIX.

Syptomps

The following messages in the alert log
-------------------------------------------------------------------
.............
Mon Oct 8 22:57:05 2007
Errors in file /dwrac/BDUMP/dwhp_p009_1941690.trc:
ORA-27061: waiting for async I/Os failed <<<<
IBM AIX RISC System/6000 Error: 22: Invalid argument

Users affected:
* systems running aix 5.3 with the bos.rte.aio fileset at the 5.3.0.60 or 5.3.0.61 level.
* this problem is known to affect oracle.

Problem description:
When the maximum requests (maxreqs) is exceeded, lio_listio will update the control blocks, but may not return an error to indicate that some i/os were not started. this results in an unexpected einval from aio_nwait later when these i/os are waited for.

* errors similar to the following may be logged by oracle:
* ora-27061: waiting for async i/os failed
* ibm aix risc system/6000 error: 22: invalid argument



There are 3 actions to try :

1) Increase the number of aios and the maxreqs

Check aio the current settings:
aioo -a
or
lsattr –El aio0

Set maxreqs at least 8192.

chdev -P -l aio0 -a maxreqs=8192
or
aioo -p -o maxreqs=8192 (AIX 5.3 and up)

If this does not resolve the problem increase the number of min/max aio servers

chdev -P -l aio0 -a maxservers=[new value] -a minservers=[new value]

To make changes persistance on reboot

aioo -p -o maxservers=[new value] -o minservers=[new value]

To apply new values online (AIX 5.3 and up)

2) Decrease the parallel_min_servers and parallel_max_servers on the failing instance.

FOR RAC

ALTER SYSTEM SET parallel_min_servers = [NUM] SCOPE=BOTH SID='MYINSTANCE1';
ALTER SYSTEM SET parallel_max_servers = [NUM] SCOPE=BOTH SID='MYINSTANCE1';

FOR SINGLE DB
ALTER SYSTEM SET parallel_min_servers = [NUM] SCOPE=BOTH;
ALTER SYSTEM SET parallel_max_servers = [NUM] SCOPE=BOTH;

3) IBM FIX for AIX 5.3 ML6 and UP
Install fix for apar iz03260



REFERENCES
Metalink Note 464768.1
http://www-01.ibm.com/support/docview.wss?uid=isg1IZ03260

Friday, January 15, 2010

Oracle 10g REDO LOGS sizing

The size of the redo log files can influence performance, because the behavior of
the dbw and arc processes depend on their sizes.
Larger redo log files provide better performance but instance recovery time is increased. Sou you must find a good balance depending the type of your system and your instance recovery time needed.

Also be careful that undersized log files increase checkpoint activity and increase CPU usage.Checkpoint frequency is affected by several factors, including log file size and
the setting of the FAST_START_MTTR_TARGET.

If the FAST_START_MTTR_TARGET parameter is set to limit the instance recovery time,
Oracle automatically tries to checkpoint as frequently as necessary.
Under this condition, the size of the log files should be large enough to avoid
additional checkpointing due to under sized log files.

In Oracle 10g the redo logfile size can tuned as per recommendation specified by column optimal_logfile_size of v$instance_recovery or in the EM Dbconsole under Administration Page under Storage on the Redo Log Groups Link on the Actions List.

In order to enable this feature you must set the parameter "fast_start_mttr_target" in order to activate the advisory and the population of the column optimal_logfile_size.

The key columns in v$instance_recovery view are :

ACTUAL_REDO_BLKS
The current actual number of redo blocks required for recovery.

TARGET_MTTR
Effective MTTR (mean time to recover) target value in seconds. The TARGET_MTTR value is calculated based on the value of the FAST_START_MTTR_TARGET parameter and is usually an approximation of the parameter's value. However, if the FAST_START_MTTR_TARGET parameter value is very small (for example, one second), or very large (for example, 3600 seconds), the calculation will produce a target value dictated by system limitations. In such cases, the TARGET_MTTR value will be the shortest calculated time, or the longest calculated time that recovery is expected to take.

ESTIMATED_MTTR
The current estimated mean time to recover (MTTR) based on the number of dirty buffers and log blocks (0 if FAST_START_MTTR_TARGET is not specified). Basically, this value tells you how long you could expect recovery to take based on the work your system is doing right now.

OPTIMAL_LOGFILE_SIZE
The recommended optimal redolog file size for the current setting of FAST_START_MTTR_TARGET.

select ACTUAL_REDO_BLKS,TARGET_MTTR,ESTIMATED_MTTR,OPTIMAL_LOGFILE_SIZE
from v$instance_recovery;

The target is to reduce ACTUAL_REDO_BLKS by setting an appropriate value of FAST_START_MTTR_TARGET and finding the OPTIMAL_LOGFILE_SIZE for the work your system does.