Wednesday, December 08, 2004

MSDTC and Exchange clusters

Over the years, the best practice recommendations on where to place the MS Distributed Transaction Coordinator (MSDTC) resource in Exchange cluster has bounced between two differing views. One camp thought it was unadvisable to place this resource in the cluster group occupied by the quorum resource, while others noted that Exchange doesn't really use DTC and its wasteful to dedicate a whole group (with IP and disk) to this resource.

Let's address this from the bottom up:

Why does Clustered Exchange 200x need a MSDTC resource anyways?

Exchange provides support for a workflow function, and this capability is installed on all Exchange 200x servers by the Exchange setup. This functionality is encapsulated in the CDOWFEVT.DLL binary. This workflow functionality requires various components to be registered with COM+ to function. COM+ requires DTC to be running. And on Clustered Windows, DTC won't initialize unless it's configured for the cluster -- ala MSDTC resource. Perhaps this domino-effect failure is familiar, particularly if you've run into the problem described in KB.312316!

So, does Exchange 200x workflow use MSDTC extensively?

Nope. Essentially just for the setup/upgrade steps, particularly if you've not actively implemented workflow applications.

When, then, do some folks recommend to not put MSDTC into the default cluster group?

KB.168948 says pretty clearly not to put anything in the default cluster group that doesn't need to be there. This is a very important group, and anything placed in this group can contribute to lowered availability of the cluster.

But doesn't the old COMCLUST.EXE program put it there automatically?

Yes. Typically. Unless you follow the wacky steps in KB.290624. This is a big part of what leads to this confusion, because in Windows 2003 you don't use Comclust anymore, instead following the steps in KB.301600 so we see lots of variation with customers here based on which method they followed.

Ok, that makes sense... so why do others say DO put it in that group for Exchange 200x clusters?

Here's where it gets interesting. Consider the characteristics of a typical, real-world Exchange 200x cluster server: heavily-loaded, never enough disk spindles, etc. If you follow this path of recommendations, now all of a sudden you need to add AT LEAST one extra IP, one extra network name, and one extra physical disk. Just to support this MSDTC resource that 99% of Exchange clusters only need for setup/upgrade purposes. If you have to choose between allocating this extra disk spindle to an underutilized MSDTC resource or to an underprovisioned database store LUN, most Exchange design architects will choose the second!

Now, here's the meat of this posting: Microsoft has long advocated the first option (dedicated group/disk/etc for MSDTC) as a best practice, since it maintains the cluster group containing the quorum disk untouched. Pretty much all of the documentation and KBs currently state this position. However, after a bunch of internal discussion and debate, this recommendation is about to change and the KB/documentation will be updated shortly.

So, if you have long ignored this recommendation and placed the MSDTC resource in the cluster group containing the quorum... you're fine. Exchange so minimally utilizes DTC that the performance hit and impact to cluster availability are negligible and generally not worth allocating the additional resources. Put that extra disk toward your Exchange data storage instead (see previous postings here, here, and here for more). Additionally, it is recommended that you remove (uncheck) the option to "Affect the Group" from this MSDTC resource, so that if it does ever fail it will not impact the cluster group containing the quorum resource.

Note that this recommendation change does NOT apply if you are running SQL or some other application that utilizes MSDTC extensively. If your MSDTC is being utilized, you definitely still should keep this resource out of the cluster group containing the quorum to avoid any possible negative effect on cluster availability.

No comments: