The recent EHLO blog post to inform Exchange administrators about a recommended hotfix for Windows Server 2008 R2 came as a welcome reminder that the Database Availability Group (DAG), the headline feature of Exchange 2010, is layered on top of Windows Failover Clustering (WFC) and is therefore prone to any problem in Windows 2008 R2 and/or WFC.
The dependency on WFC is the reason why you need to run Exchange 2010 on Windows 2008 R2 enterprise edition if you want to deploy a DAG. Exchange 2010 standard edition can be used for a DAG, but only if you want to host five online (mounted) databases or less per node. The enterprise edition of Exchange 2010 is better suited for large DAGs because it allows a node to support up to one hundred online databases.
WFC is also the reason why DAGs are limited to sixteen nodes. In most cases this is not a problem as sixteen Exchange 2010 mailbox servers deployed in a DAG are capable of providing a highly reliable service to multiple tens of thousands of users, even if a number of servers are down for maintenance or another reason and their load has to be transferred to the remaining nodes. It would be nice if DAGs supported more than sixteen nodes but that isn’t going to happen until WFC increases the limit.
Microsoft accurately represents the DAG as native data protection for Exchange. They have done a good job of hiding the complexities that often surround clustering through the user interface and wizards presented in the Exchange Management Console (EMC) that administrators use to configure and manage DAGs. And although there are times when an administrator will need to get down close and personal to meddle with the finer details of a DAG, you still should never need to become acquainted with the underlying cluster.
Of course, just like any other technology, cluster errors do occur and you might then have to manipulate some aspect of the cluster underneath a DAG. For example, a situation where I had to eject a failed DAG member out of a cluster after the Exchange tools wouldn’t let me remove it from the DAG. However, these situations are rare and you should definitely not resort to any messing around with the cluster unless you’re instructed to do so by Microsoft support or some other competent authority. The golden rule is always to attempt to fix the problem using Exchange commands first. In my case, I ran into the problem with a lab cluster running on virtual machines when there were some problems with the host computer that I suspect caused the error.
The upshot of the close relationship between WFC and the DAG is that Exchange 2010 administrators need to have an awareness and appreciation of WFC without needing to understand Windows clusters to the nth degree. Some might say that this creates a situation where administrators know just enough about clusters to be dangerous. I’ll admit that this could be the case so there’s a balance to be struck. You can expend tons of time to become a WFC expert but I doubt that you’ll ever use that expertise in an Exchange environment unless you specialize in fixing bust DAGs.
On the other hand, you can ignore WFC totally and hope that you never need the knowledge. That would probably be an over-reaction to the way that Exchange 2010 has wrapped the DAG around WFC as it’s important to at least appreciate WFC concepts and how these are used by the DAG (the role of the File Share Witness and quorum models are good examples of concepts that any DAG administrator needs to understand).