Tuesday, May 4, 2010

Active Manager in Exchange 2010

Microsoft Exchange Server 2010 includes a new component called Active Manager that provides functionality that replaces the resource model and failover management features provided by integration with the Cluster service in previous versions of Exchange. Exchange no longer uses the cluster resource model for high availability. There are no cluster groups for Exchange, and there are no storage resources in the cluster. Thus, if you examine the cluster using cluster management tools, you’ll see only the core cluster resources (IP Address and Network Name, and if needed, quorum resource). Cluster nodes and networks will also exist, but those are managed by Exchange and not cluster or cluster tools.
Active Manager runs on all Mailbox servers that are members of a database availability group (DAG).
There are two Active Manager Roles:
1. Primary Active Manager (PAM)
2. Standby Active Manager (SAM).
PAM is the Active Manager in a DAG that decides which copies will be active and passive. PAM is responsible for getting topology change notifications and reacting to server failures.The DAG member that holds the PAM role is always the member that currently owns the cluster quorum resource (default cluster group). If the server that owns the cluster quorum resource fails, the PAM role automatically moves to a surviving server that takes ownership of the cluster quorum resource, if you need to take the server that hosts the cluster quorum resource offline for maintenance or an upgrade, you must first move the PAM to another server in the DAG.

SAM provides information on which server hosts the active copy of a mailbox database to other components of Exchange that are running an Active Manager client component (for example, RPC Client Access service or Hub Transport server). The SAM detects failures of local databases and the local Information Store. A SAM doesn't determine the target of failover, nor does it update a database’s location state in the PAM. It will access the active database copy location state to answer queries for the active copy of the database that it receives.
In Exchange 2010, the Microsoft Exchange Replication service periodically monitors the health of all mounted databases. In addition, it also monitors Extensible Storage Engine (ESE) for any I/O errors or failures. When the service detects a failure, it notifies Active Manager. Active Manager then determines which database copy should be mounted and what it requires to mount that database.

Active Manager Best Copy Selection:
When a failure occurs that affects a replicated mailbox database, Active Manager takes several steps to recover from the failure by selecting the best possible copy of the failed database to activate. Below is the process:
• Active Manager detects the failure.
• The PAM runs an internal algorithm called best copy selection (BCS).
• A process called attempt copy last logs (ACLL) occurs, which tries to copy any missing log files from the server that hosted the active database copy prior to the failover.
• Once the ACLL process has completed, the PAM issues a mount request to the Microsoft Exchange Information Store via remote procedure call (RPC). At this point, either:
1. The database mounts and is made available to clients; or
2. The database does not mount, and PAM performs steps 2-4 on the next best copy (if one is available).

Lossless failover:

When searching for the best possible copy, the PAM uses up to ten separate sets of criteria to determine the best copy to activate. After locating the best possible copy, ACLL runs. After the ACLL process has completed, if all missing log files were copied from the previous active copy, the database mounts without any data loss. This is known as a lossless failover.

If the ACLL process is unsuccessful, the configured value for AutoDatabaseMountDial is consulted. For more information about AutoDatabaseMountDial, If the number of lost logs is within the configured value for AutoDatabaseMountDial, the database is mounted. If the number of lost logs is outside the configured value for AutoDatabaseMountDial, the database isn't mounted until either missing log files are recovered or until an administrator explicitly mounts the database and accepts the larger data loss. If the database doesn't mount automatically, the PAM will select the next best copy (if one is available). There are at least three reasons why the initially selected database copy does not mount automatically:
1. The number of lost log files is greater than the configured value for AutoDatabaseMountDial.
2. The server on which the mount attempt was made is configured with a soft maximum for the active number of databases, and the maximum number of active database copies has been reached on the server.
3. The database copy is suspended for activation.

No comments:

Post a Comment