|
|
94% availability of a
football player indicates that the player has missed one game of a season.
However, 99.97% availability of a system indicates a downtime of 2.5 hours in a
year. Today’s business requirements are to meet 99.99% or 99.999% availability,
which indicates .5 hours or no downtime. To meet these high availability
requirements with no downtime in a year, the factor critical for success is for
systems to provide for auto failover when one participating system fails. If
this does not occur, when a system or a participating node fails from a cluster,
then a considerable amount of time will be used in migrating the user from the
failed node to another, (reestablishing the database connection, re-executing
the query, user having to browse through the screen to his/her last view etc).
Oracle Real Application Cluster (RAC) provides multiple systems to participate
in a clustered configuration, providing continuous availability, and when one of
the participating systems fail, the users are migrated to the other system,
providing a failover mechanism. With the failover process using the transparent application failover (TAF) option,
database connections and processes that had lost connection are reconnected
and the failover is transparent
to the user.
ORACLE REAL APPLICATION
CLUSTERS Figure 1: Oracle Real Application Clusters RAC allows multiple Oracle Instances residing on
different nodes to access the same physical database. As illustrated in Figure
1, you can access the same database from any of the available participating
instances, look at the same set of data and share data between instances via the
cluster interconnect. The Global Cache Service (GCS) and Global Enqueue Service
(GES) maintain consistency across the caches of the different nodes. RAC
protects against either node failure or communication failure to a subset of the
nodes. With an RAC implementation, two different setups for availability are
possible: Failover and scalability are very important
factors in today’s Internet based economy and business. These two features of
the RAC support the reliability/availability factor of the database tier of the
enterprise system. HOW DOES THE FAILOVER
MECHANISM WORK?
RAC relies on the cluster manager (CM) of the clustered operating system for failure detection because the CM maintains the heartbeat
functions.
The CM is a distributed kernel component that monitors whether cluster members can communicate with each other, and enforces the rules of cluster membership. Among the various functions in the management of the
cluster, the CM also maintains the heartbeat functions. Using the heartbeat mechanism the CM allows the nodes to communicate with the other nodes that are available on a continuous basis at preset intervals, e.g. 2 seconds on Sun and Linux clusters. When a node, or the communication to a node, fails, the heartbeat from the node does not get through and after a time-out period (configurable on most operating systems), the remaining nodes detect the failure and attempt to reform the cluster. (The heartbeat timeout parameter, like the heartbeat interval, varies from operating system to operating system, while the default heartbeat timeout parameter on Sun clusters is 12 seconds and the default on Linux clusters is 10 seconds). If the remaining nodes form a quorum, the other nodes reorganize the cluster membership. The reorganization process regroups the nodes that are accessible and removes the nodes that have failed. For example, in a 4-node cluster, if one node fails, the CM will re-group among the remaining 3 nodes. The CM performs this step when a node is added to a cluster or a node is removed from a cluster. This information is exposed to the respective Oracle Instances by the LMON process running on each cluster node.
The next step in the failover process is the
database recovery operation. During the database recovery process, re-mastering
of GCS from the failed Instance, cache recovery (rolling forward all
transactions) from the redo logs of the failed Instances and transactions
recovery (rolling back all uncommitted transactions) are performed by the
remaining nodes. With the redo logs located on a shared disk system (in RAC
implementations), the recovery process is even smoother. The Instance that first
deducted the failure reads from the redo logs and applies them to the database.
After detection and during the remastering of GCS of the failed Instance and
cache recovery, most work in the surviving Instance is paused and while
transaction recovery takes place, work occurs at a slower pace. This point is
considered full database availability, as now all data is accessible,
including that which resided on the failed Instance. The application is now
responsible for reconnecting the users and repeating any uncommitted work they
have done. While RAC provides for high
availability of the systems, the database servers and the applications that
use it, there are various features and options under the RAC umbrella that
provides even more support towards achieving the 99.999% availability of
today’s Internet based business requirements. RAC allows for multiple systems
to participate in a clustered configuration providing continuous availability.
When one of the participating systems fails, the users are migrated to the
other system, thus providing a failover mechanism.
The best failover is the one that no one notices.
Unfortunately, even though Oracle has been architected to recover very quickly,
failures can severely disrupt users by dropping connections to the database.
Work in progress at the time of the failure is most likely lost. If the user
queried 1000 rows from the database and a failure on the node occurred while the
user was scrolling through these rows on his/her browser, the failure would
cause the user to re-execute the query and browse through these rows again. This
disruption could be eliminated for most situations by masking the failure with
the transparent application failover (TAF) option.
TRANSPARENT APPLICATION FAILOVER
RAC provides near-continuous availability by
hiding failures from end-user clients and application server clients. This
provides continuous, uninterrupted data access. TAF reroutes
application clients to an available database node in the cluster when the
connected node fails. Application clients do not see error messages describing
loss of service.
Figure 2: Oracle Transparent Application Failover
In the above configuration, if the users
connection to Node 1 dies, their transaction is rolled back but they can
continue work without having to manually reconnect. To get a good understanding
of how the TAF architecture works, it is helpful to walk through a failover
scenario using the earlier example, where a user is querying the database to
retrieve 1000 rows from the database. Assume that the user is connected to Node
1/ Instance 1. By following the steps identified in Figure 2:
In Figure 2 when Node 1 fails, any SELECT
statements that had partially executed on Instance 1 are migrated as part of the
failover process and are displayed through Instance 2 when the user process
fails over to Node 2. All this happens transparently without any interruption to
the user. It should be noted that along with the ‘SELECT’ statement, the
following are also failed over: The benefits that such a feature adds in meeting
the high availability requirements in today’s machine critical applications are
overwhelming and the first question that arises is this, why did Oracle not
introduce such a feature before or why is this feature not available among other
databases? Though the mechanism is very useful in meeting today’s high
availability requirements, implementing such a feature is complex, basically
because the database connections are not stateless. This means that during the
moment of failure, the database, the users and the transactions are in a
specific state of operation, such as: It should be noted from the scenarios above that
only SELECT statements are failed over from one node to another; transactional
statements are not failed over by configuration of TAF. However, transactional
statements can programmatically be transferred from Node 1 to Node 2 by proper
validation of Oracle returned error messages and taking appropriate actions.
This should be a preferred method to avoid any user interruptions as well as
keeping the database or system failures transparent to the user. Among the
transactional statements, the following do not failover or are not protected
when a node fails: While the failover is in process, it is user
friendly to inform the user via the application interface that the activity or
command issued may take some time. This information could be communicated by
validating the various error messages returned by Oracle as part of the node and
connection failure. Some of the common Oracle error codes that should be handled
by the application to track and transfer transactional statements include: TAF CONFIGURATION
Configuring the TAF option involves adding Oracle
Net parameters to the tnsnames.ora and, when one of the participating nodes
encounters failure, the use of parameter values to ascertain the next step in
the failover process. The parameter that drives the TAF option is the
FAILOVER_MODE under the CONNECT_DATA section of a connect descriptor. By using
one or more the following sub parameters, the full functionality of TAF can be
obtained: Another important parameter, or value, that
should not be configured manually is the GLOBAL_DBNAME parameter in the
SID_LIST_listener_name section of the listener.ora. Configuring this parameter
in listener.ora disables TAF. If the GLOBAL_DBNAME parameter has been defined,
the parameter should be deleted and the database should be allowed to
dynamically register the global database names automatically.
TAF IMPLEMENTATION
The TAF option can be implemented in one of three
ways: Through the examples below the various
implementations options are explained.
CONNECT-TIME FAILOVER AND CLIENT LOAD BALANCING
Oracle Net connects randomly to one of the listener addresses on
node ORADB3 or ORADB4. If the Instance fails after the connection,
Oracle Net fails over to the other node’s Instance, preserving any
SELECT statements in progress. With the RETRIES and DELAY parameters as part of the failover node
sub parameter, the connections to the nodes are automatically
retried to the number of times specified by the parameter. In this
scenario the connection is retried 20 times with a delay of 15
seconds between every retry. Unlike the other option where one
node in the cluster fails and the connection is reestablished on
one of the other surviving nodes, under this option, the
connection is retried on the same node and no backup node is
defined as part of the configuration. Similarly there is no
significance to the load-balancing parameter and has been set to
off. PRE-ESTABLISHING A CONNECTION Another implementation option available under the TAF
configuration is to setup a pre-established connection to the
backup or secondary Instance. One of the potential performance
issues is the time required to re-establish a connection after the
primary, or the first, node has failed to the backup or secondary
node. This could be resolved by pre-establishing connections,
which means that the initial and backup connections are explicitly
specified. While there is a great advantage in pre-establishing
the connection, this is not without any drawbacks. Pre-established
connections consume resources. During testing processes
experienced by the author, up to 2% of additional CPU usage when
using pre-established connections has been noted. This is because
the process always validates the connection throughout its
activity. In the following example, the Oracle Net connects to the
listener on ORADB3. If ORADB3 fails after the connection, Oracle
Net fails-over to ORADB4, preserving any SELECT statements in
progress. Having the backup connection already in place can reduce
the time needed for a failover in the event that one needs to take
place. However, if the connect to the failed Instance does not
succeed at connect time, then fail back to the original Instance
is not possible. This option increases resource overhead in that
the backup connection is always maintained. TAF VERIFICATION Implementation of TAF could be verified by querying the data
dictionary views. V$SESSION has three columns FAILOVER_MODE,
FAILOVER_TYPE and FAILED_OVER that provides information pertaining
to TAF implementation and verification of results when the node in
the cluster crashes and fails over to one of the available nodes. The above query provides the details and status of the failover
operation. If there were many sessions on Node 1 it would be good
to look at the details by grouping the results. The following
query gives a consolidated count on the operation. CONCLUSION With hardware level clustering and the proper usage of high
availability solutions such as Oracle Real Application Clusters,
users have a minimum impact when a system in the cluster is lost.
However, with the usage of TAF option, a higher availability of
the system is possible to meet today’s business requirements. This
option provides the smooth transition from the primary Instance to
the secondary Instance with minimal or no interruption to the user
when connections to the primary Instance is lost. ABOUT THE AUTHOR Murali Vallath has over 17 years of experience designing and
developing databases. He is an Oracle Certified Database
Administrator. He currently works as a Senior Oracle Technologist
for Quest Software, USA. His specialty area is with Oracle
Parallel Server/Oracle Real Application Clusters. He has presented
at various conferences (Oracle World, EOUG, UKOUG, IOUG) and has
published papers in leading publications such as SELECT and Oracle
Scene. He is the author of the book titled ‘Oracle Real Application
Clusters’ (Publisher: Digital Press). Murali is the President of
the new formed RAC –SIG and the Charlotte Oracle Users Group (www.cltoug.org).
Murali can be reached at murali.vallath@quest.com REFERENCES


INSERT,
UPDATE
and DELETE
operations
Parameter
Description
BACKUP
Specifies a different net service name to be used to establish
the backup connection. A backup should be specified when using
PRECONNECT to pre-establish connections. Specifying a BACKUP is
strongly recommended for BASIC methods; otherwise, reconnection
may first attempt the Instance that has just failed, adding
additional delay until the client reconnects.
TYPE
Specifies the type
of failover. Three types of Oracle Net failover functionality
are available by default to the Oracle Call Interface.
METHOD
Determines the speed of the failover from the primary to the
secondary or backup node.
RETRIES
Specifies the number of times to attempt to connect to the
BACKUP node after a failure before giving up.
DELAY
Specifies the amount of time in seconds to wait between attempts
to connect to the BACKUP node after a failure before giving up.
SUMSKYDB =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS=(PROTOCOL=TCP)(HOST=ORADB3)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=ORADB4)(PORT=1521))
)
(CONNECT_DATA =
(SERVICE_NAME = SUMSKYDB.SUMMERSKYUS.COM)
(FAILOVER_MODE = (TYPE=SELECT)(METHOD=BASIC)
)
)
)
RETRYING A CONNECTIONSUMSKYDB =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=OFF)
(FAILOVER=ON)
(ADDRESS=(PROTOCOL=TCP)(HOST=ORADB3)(PORT=1521))
)
(CONNECT_DATA =
(SERVICE_NAME = SUMSKYDB.SUMMERSKYUS.COM)
(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=20)(DELAY=15)
)
)
)
SSRAC3 =
(DESCRIPTION =
(ADDRESS =(PROTOCOL=TCP)(HOST=ORADB3)(PORT=1521))
(CONNECT_DATA =
(SERVICE_NAME = SUMSKYDB.SUMMERSKYUS.COM)
(INSTANCE_NAME = RAC3)
(FAILOVER_MODE =
(BACKUP=SSRAC4)(TYPE=SELECT)(METHOD=PRECONNECT)
)
)
)
SSRAC4=
(DESCRIPTION =
(ADDRESS =(PROTOCOL=TCP)(HOST=ORADB4)(PORT=1521))
(CONNECT_DATA =
(SERVICE_NAME = SUMSKYDB.SUMMERSKYUS.COM)
(INSTANCE_NAME = RAC4)
(FAIOVER_MODE =
(BACKUP=SSRAC3)(TYPE=SELECT)(METHOD=PRECONNECT)
)
)
)
SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER
FROM V$SESSION
SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT (*)
FROM V$SESSION
GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER