Navigation Bar

Saturday, November 23, 2024

Dataguard - Switchover and Failover

 Switchover and Failover are activities in which the standby database takes over the production database. It also describes how the database roles are reverted once the problem is fixed.

This role transition activity is of two types

  • Switchover
  • Failover

Switchover

Switchover is the act of changing the standby database into the primary database in a controlled manner. This is a planned activity and is safe from a data loss because the primary must complete all redo generation on the production data before allowing the switchover to commence. Switchback is nothing but a switchover in reverse order. This planned activity is usually done in a quiet period. The reason for the switchover may be a DR testing, patching activity, hardware change etc.

Once the switchover is complete, the redo from the new primary will be sent to the remaining standby servers, including the old primary. If you are using grid control or dataguard broker, the steps involved in a switchover process will be done automatically. But if you are doing it through SQLPLUS in a regular dataguard setup there will be some manual steps involved.

The actual command for the switchover is as follows

ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY;
When the switchover command is executed redo generation is stopped, all DML related cursors are invalidated and users are prevented from executing transactions or terminated and the current redo log is archived. A special switchover marker called the EOR (end of reader) in then placed in the header of the next sequence and the online redo logs are archived for the second time, sending the final sequences to the standby databases. At this point the physical standby database is closed and the final log switch is done without allowing the primary database to advance the sequence numbers.  
After the EOR is sent to the standby databases, the original primary database is finalized as the standby and its control file is backed up to trace file and converted to the correct type of standby control file. In the case of a physical standby switchover, the MRP process is automatically started on the original primary to apply the final archive logs that contain the EOR so that all the redo generated is processed. The primary is then dismounted and restarted as a standby database in the mount state.
The standby database must receive the EOR redo, otherwise the switchover cannot occur. Once this redo has been received and applied to complete the switchover, you run the following command if you are using Grid Control or the Broker process.
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
The physical primary switchover will wait for the MRP process to exit after processing the EOR redo and then convert the standby control file into a normal production control file. The final thing to do is to open the database for general production use.
alter database open;
A logical standby also has to wait for the EOR redo from the primary to be applied and dataSQL apply to shut down before the switchover. Once the EOR has been processed, the GUAURD can be turned off and be open for production processing. 

Failover

A failover is an unplanned event when something has happened to the hardware, networking etc. The difference from switchover is that a switchover is invoked from primary database. In a failover mechanism the primary itself is no longer available. Depending on the protection mode chosen, there may be some data loss. You start the failover process by telling dataguard to apply the remaining redo that is possible to apply. Once the redo has been applied you run the same command that you do with a physical standby to switchover the standby to a primary.
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
Once the switchover is completed the protection mode will be maximum performance regardless of what is was before. To get it back to the original protection mode, you have to get the standby database up and running and run command to change to the appropriate protection mode.

Since the redo heartbeat is sent every 6 seconds, you may lose 6 seconds of redo during a failover. At failover the merging thread will look at the last log of the disconnected thread and use the last heartbeat in it to define the consistent point, throwing away all the redo that the surviving nodes had been sending all along.

Steps for a physical switchover

In this section we will perform a switchover using SQLPlus.
Doing this process with SQLPlus will give you a good understanding of how the switchover process works and what the broker does in the background.
To start the switchover perform the following checks
  • Make sure all the dataguard parameters have been configured.
  • Make sure the SRL files have been configured.
  • Make sure you have a temporary datafile defined.
  • Make sure flashback has been enabled.
The reason for the checks is if anything goes wrong, it will be easy to switch back to the primary.

Check that redo has been received. 
The sync status should be YES. If any discrepancy, these will be need to be resolved before going ahead.The synchronized column should be YES on the standby alpha database.
select db_unique_name, protection_mode, synchronization_status, synchronized from v$archive_dest_status;
select client_process, process, sequence#, status from v$managed_standby;
Check the log sequence# on the primary also.
select thread#, sequence#, status from v$log;
Once you have determined that the redo stream is current, you need to check that redo has been applied on the physical standby database.
Check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you need to start the apply process, you also may see the status of wait_for_gap so wait until the gap have been resolved first
select client_process, process, sequence#, status from v$managed_standby;
Finally you must cancel any jobs and backups that may be running.
Use the following command to check the same.
select process, operation, r.status, mbytes_processed pct, s.status from v$rman_status r, v$session s where r.sid=s.sid;
Once the checks have been carried out, you are ready to perform the switchover.
You may want to tail the alert logs to get an idea of what is going on in the background and you can familiarize yourself with the messages that are displayed. You can increase the level of the messaging for higher diagnostics.
 
To increase the logging level
alter system set log_archive_trace=8129;
To switch off the logging level
alter system set log_archive_trace=0;
If you are using flashback, make sure to drop the GRP restore point on all the databases after the switchover is complete, otherwise you will be generating permanent flashback database logs forever.
select switchover_status from v$database;
Now we are ready to switchover. On the primary database orcl.
alter database commit to switchover to physical standby with session shutdown;
Keep an eye on your alert logs, then check that first part of the switchover has completed
select switchover_status from v$database;
To finish off the switchover. On the physical standby database alpha.
alter database commit to switchover to primary with session shutdown;
After this command the database will be in mount stat. Open the new primary database
alter database open;
Finish off the old primary and start it up as a standby
shutdown immediate;
startup mount;
alter database recover managed standby database using current logfile disconnect from session;

Performing a Failover to physical standby with SQLPlus


To do a failover in case of the primary database crashing or going down abnormally, you have to choose a database which is in maximum protection mode. 
If you have two databases with the same protection mode and using SYNC, you need to determine which database has lesser lag, which will then become your choice for being the primary as there is lesser chances of data loss.

Query to get the lag time
select name, value, time_computed from v$dataguard_stats where name like '%lag%';
Query to get the SCN
select thread#, sequence#, last_change#, last_time from v$standby_log;
Once you have chosen the target we are ready to failover.

Since the primary  is down, notify the standby that it will be the new primary and to apply all the redo.  
alter database recover managed standby database cancel;
alter database recover managed standby database finish;
Now issue the switchover command and open the database.
alter database commit to switchover to primary with session shutdown;
alter database open;
You can raise the protection mode if required.
set standby database to maximum protection;
To failover to logical standby 
alter database activate logical standby database finish apply;
Restarting the old primary
In case the old primary database has gone down due to an abnormal database crash or failure and the problem is resolved, you can bring it up as a standby database so that your dataguard is in place. If flashback database is enabled on your old primary we can easily bring it back as the new standby database as follows. 
Since redo is applied by SCN, we need the failover SCN from the new primary.
select to_char(standby_became_primary_scn) failover_scn from v$database;
FAILOVER_SCN 
------------ 
10345610 
Now start the old primary in mount mode, flashback the old primary to this SCN and restart the database in mount mode.
startup mount;
flashback database to SCN 10345610;
alter database convert to physical standby;
shutdown immediate;
startup mount;
The old primary will start to resolve any gap issues at the next log switch, which means we can start the MRP process to get this standby going to catchup as fast as possible
alter database recover managed standby database using current logfile disconnect from session;
Eventually the missing redos will be sent to the standby and applied, bring us back to synchronization again. Finally shutdown and startup the new standby database and start the MRP process so that the redo logs will start applying at runtime.
shutdown immediate;
startup mount;
alter database recover managed standby database using current logfile disconnect from session;

References 

Thought for the day
    In regards to the immoral man/woman 
And now my child, listen to me, and be attentive to the words of my mouth 
Do not let you hearts turn aside to her ways; do not stray into her paths. 
For many are those she has laid low, and numerous are her victims. 
Her way is the way to Sheol, going down to the chambers of death.
Proverbs 7:24-27

A few notes for those teaching the proverbs to Children: 
GENDER: 
This ancient book was first taught to sons, therefore, all the pronouns are aimed at boys. 
They are warned against immoral WOMEN, but daughters are not warned against immoral MEN.
However, you the parent, can (and should) easily be able to apply the reverse side of these lessons to your daughters as well.

Pastor Garrett Lahey

No comments:

Post a Comment