Friday, November 26, 2010

Exchange Recovery: Failed DAG Member in Exchange Server 2010

This is the demonstrate a recovery scenario for a failed Mailbox server that is a member of an Exchange 2010 Database Availability Group.  In this scenario the DAG has two members, EX1 and EX2.  EX2 has suffered a serious hardware failure and needs to be recovered.
With server EX2 down the each mailbox database in the DAG has switched over to EX1 and shows the following status information.
[PS] C:\>Get-MailboxDatabaseCopyStatus -Identity "Mailbox Database 01"

Name                                          Status          CopyQueue
                                                              Length
----                                          ------          ---------
Mailbox Database 01\EX1                       Mounted         0
Mailbox Database 01\EX2                       ServiceDown     0
The Exchange recovery process begins by reinstalling Windows Server 2008 R2 on the new server.
Installing Windows Server 2008 R2
Because this Exchange recovery is for a member of an Exchange 2010 DAG the server must be installed with the Enterprise edition of Windows Server 2008 R2.

Exchange 2010 DAG members require the Enterprise edition of Windows Server
After Windows Server 2008 R2 is finished installing log on to the server and complete the following tasks:
  • Configure the Timezone settings
  • Configure the Automatic Update settings
  • Configure the server with the same TCP/IP configuration as the previous server
  • Configure the server with the same name as the previous server (in this case EX2)
  • Join the server to the Active Directory domain
The next step is to install the Exchange 2010 pre-requisites for the Mailbox server role.  From an elevated PowerShell prompt run the following commands.
Import-Module ServerManager

Add-WindowsFeature NET-Framework,RSAT-ADDS,Web-Server,Web-Basic-Auth,Web-Windows-Auth,Web-Metabase,Web-Net-Ext,Web-Lgcy-Mgmt-Console,WAS-Process-Model,RSAT-Web-Server -Restart
After the server has restarted we also need to install the Exchange Server 2010 SP1 hotfixes for Windows Server 2008 R2. These updates require another restart of the server.
Before installing Exchange Server 2010 on the server being recovered we first need to remove it from the DAG. On another Exchange 2010 server open the Exchange Management Shell and run the following commands.
First, determine which mailbox databases the server was hosting a copy of, the activation preferences, and any replay lag that was configured. In this example server EX2 hosted copies of Mailbox Database 01 and Mailbox Database 02.
[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag*

Name                 : Mailbox Database 02
Servers              : {EX2, EX1}
ActivationPreference : {[EX2, 1], [EX1, 2]}
ReplayLagTimes       : {[EX2, 00:00:00], [EX1, 00:00:00]}
TruncationLagTimes   : {[EX2, 00:00:00], [EX1, 00:00:00]}

Name                 : Mailbox Database 01
Servers              : {EX1, EX2}
ActivationPreference : {[EX1, 1], [EX2, 2]}
ReplayLagTimes       : {[EX1, 00:00:00], [EX2, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00], [EX2, 00:00:00]}

Name                 : Archive Mailboxes
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}
Next, remove the failed server from each of the mailbox databases that it held a copy of.
[PS] C:\>Remove-MailboxDatabaseCopy "Mailbox Database 01\EX2"

[PS] C:\>Remove-MailboxDatabaseCopy "Mailbox Database 02\EX2"
Warnings will appear because the failed Exchange server EX2 can’t be communicated with, however the change can be confirmed by repeating the earlier command.
[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag*

Name                 : Mailbox Database 02
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}

Name                 : Mailbox Database 01
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}

Name                 : Archive Mailboxes
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}
Next, remove the failed server from the Database Availability Group. Run the following command in the Exchange Management Shell.
[PS] C:\>Remove-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer EX2
Note: in some DAG topologies this action will fail with an error “A quorum of cluster nodes was not present to form a cluster”. If that error occurs use the solution in this
"When an Exchange Server 2010 Mailbox server that is a member of a Database Availability Group has failed, part of the recovery process is to remove it from DAG membership.
In some scenarios this process may result in an error “A quorum of cluster nodes was not present to form a cluster“.  The full error text is below.
[PS] C:\>Remove-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer EX2
Confirm
Are you sure you want to perform this action?
Removing Mailbox server "EX2" from database availability group "dag".
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y
WARNING: The operation wasn't successful because an error was encountered. You may find more details in log file
"C:\ExchangeSetupLogs\DagTasks\dagtask_2010-11-25_03-48-09.814_remove-databaseavailabiltygroupserver.log".
There was a problem changing the quorum model for database availability group dag. Error: An Active Manager operation f
ailed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"SetClusterQuorumResource() f
ailed with 0x1725. Error: A quorum of cluster nodes was not present to form a cluster"' failed..
    + CategoryInfo          : InvalidArgument: (:) [Remove-DatabaseAvailabilityGroupServer], DagTaskProblemChangingQuo
   rumException
    + FullyQualifiedErrorId : 80D96894,Microsoft.Exchange.Management.SystemConfigurationTasks.RemoveDatabaseAvailabili
   tyGroupServer
To resolve this issue use the -ConfigurationOnly switch instead to remove the failed Mailbox server from the Exchange 2010 DAG.  In this example server EX2 is being removed.
[PS] C:\>Remove-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer EX2 -ConfigurationOnly
Next, evict the failed node from the Windows Failover Cluster.
Note: you need to import the Failover Cluster module into your Exchange Management Shell session to perform this task.
[PS] C:\>Import-Module FailoverClusters
[PS] C:\>Get-ClusterNode EX2 | Remove-ClusterNode -Force
The failed server has now been removed from the Exchange 2010 DAG and the Windows Failover Cluster."

When you are ready to proceed with the Exchange 2010 install open a command prompt and run the following command from the directory that has the Exchange setup files located within.
setup /m:recoverserver
When setup has complete and the server has been rebooted, add the recovered server back in to the Database Availability Group.
[PS] C:\>Add-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer EX2
Then, taking note of any replay or truncation lag times, and activation preferences that were earlier identified, re-add the mailbox database copies to the recovered server. This process can take a long time depending on the size of the mailbox databases that need to be reseeded.
[PS] C:\>Add-MailboxDatabaseCopy -Identity "Mailbox Database 01" -MailboxServer EX2
[PS] C:\>Add-MailboxDatabaseCopy -Identity "Mailbox Database 02" -MailboxServer EX2 -ActivationPreference 1
You can now verify that the databases have the same settings that were identified earlier.
[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag*

Name                 : Mailbox Database 02
Servers              : {EX2, EX1}
ActivationPreference : {[EX2, 1], [EX1, 2]}
ReplayLagTimes       : {[EX2, 00:00:00], [EX1, 00:00:00]}
TruncationLagTimes   : {[EX2, 00:00:00], [EX1, 00:00:00]}

Name                 : Mailbox Database 01
Servers              : {EX1, EX2}
ActivationPreference : {[EX1, 1], [EX2, 2]}
ReplayLagTimes       : {[EX1, 00:00:00], [EX2, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00], [EX2, 00:00:00]}

Name                 : Archive Mailboxes
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}
The failed DAG member has now been recovered and the Exchange 2010 Database Availability Group is back to normal operation

3 comments:

  1. Recovery EDB can view the results of recovering edb files. The program doesn't modify source data of edb during restoring. The utility works under all Windows OS and almost every software configuration on your PC.

    ReplyDelete
  2. Recovery EDB can view the results of recovering edb files. The program doesn't modify source data of edb during restoring. The utility works under all Windows OS and almost every software configuration on your PC.

    ReplyDelete
  3. Nice Post....Thanks a lot for posting.

    Can you help us to recover of Cas Server & hub server if they get failed.


    Regards
    Anand S

    ReplyDelete