More on MSDTC

Last time I wrote a post on MSDTC under Vista. Here I’d like to talk a general MSDTC issue, which is not Vista-specific.

Internally MSDTC uses NetBIOS names and passes them to RPC to talk to remote machines. This requires both server and client to be able to resolve each other’s machine name.

Sometimes things can get messed up with DHCP or DNS caching. An easy workaround is to add entries with machine names and IP addresses in your hosts file. You probably need to do this on both the server and the client if neither could resolve the name.

Using MSDTC between Vista clients and Windows 2000 servers

Background

Consider the following two scenarios:

1.       You have implemented a .NET application accessing a specific COM+ service with automatic transaction processing.  You derived a class from ServicedComponent, set the Transaction attribute for the class, and applied the Automatic attribute to a method which opens a SqlConnection.

2.       Inside a TransactionScope, you wanted to open a SqlConnection enlisting to a distributed transaction.

In both scenarios, since you’ve used the services of the Microsoft Distributed Transaction Coordinator (MSDTC), you need to configure and start the MSDTC service. Assuming SQL Server is running on a Windows 2000 machine, everything works fine until you upgrade your client to Windows Vista. You’re going to see the following exception:

Unhandled Exception: System.Transactions.TransactionException: The transaction has already been implicitly or explicitly committed or aborted. —> System.Runtime.InteropServices.COMException (0x8004D00E): The transaction has already been implicitly or explicitly committed or aborted (Exception from HRESULT: 0x8004D00E)

Solution

The solution to the problem is simpler than you might imagine: make a simple change to the MSDTC settings in your Vista client. The tricky part, however, is how to set it. I’d like to share the steps:

a.       Run dcomcnfg command (don’t tell me you couldn’t find the run command:)

b.      Expand the “Component Services” node, then the “My Computer” node, then the “Distributed Transaction Coordinator” node

c.       Right-Click on the “Local DTC” node and select “Properties”

d.      On the “Local DTC Properties”, select the “Security” tab

e.      Select Network DTC Access, Allow Inbound, Allow Outbound, No Authentication Required, and Enable XA Transactions

So here is the difference. In Windows XP SP2 or Windows 2003 SP1, you only need to choose Incoming Caller Authentication Required. In Vista, however, when you’re trying to talk to a Windows 2000 Server through MSDTC service, you have to lower the security level of Transaction Manager Communication to No Authentication Required.

Now you’ve solved your problem. If you need to know why or have concerns about the security (it’s supposed to be more secure with Vista, isn’t it?), please continue to read.

Explanation

Since I’m working on ADO.NET, I’d like to quote the explanation from a developer Jim Carley in MSDTC team:

In Vista, we tightened down the MSDTC security even more than in W2k3SP1. We added code that, after doing an RpcImpersonateClient, obtained the token of the caller from the thread and checked to see if that token was in the Authenticated Users group.  The tokens that come in from W2k systems are “anonymous”, so aren’t considered in the Authenticated Users group.

On W2k3SP1, we only do a RpcImpersonateClient. We don’t check for Authenticated Users membership.

That is why “Incoming Caller Authentication Required” works on W2k3 SP1 and not on Vista.

W2k does not have the code in MSDTC to establish the right security settings expected by MSDTC in later releases, thus the need to lower the RPC security level.

If XPSP2 or WS03 SP1 systems are able to exchange transactions with a W2k system at the “incoming authentication required” level of security, they are getting lucky. There are situations where this configuration needs to be lowered to “no authentication required” also.

You probably don’t understand every technical detail (it’s very hard for people outside MSDTC team to completely understand how everything works), but the important fact to know is that in Vista, MSDTC security is enhanced and you have to lower the security level to talk with a Windows 2000 server.

Regarding the security concern, according to what Jim said, lowering the security level does open up opportunities for anonymous attacks on the RPC interfaces. Fortunately, Jim also mentioned that in Vista, there have been other changes that make the RPC-facing code more robust, thus strengthening the security even if the caller gets through the RPC security stuff.

Bluehat message recall incident: flashback to Bedlam DL3

If you don’t know what Bedlam DL3 means, I encourage you to read this blog entry You Had Me At EHLO… : Me Too!

Yesterday, a similar event happened, in a smaller scale though.

Someone sent an announcement email to the Bluehat Alert DL with more than 1000 members at the company. The email was actually sent from the DL itself.

Not long afterwards, somebody replied saying that the url in the announcement message was incorrect.

The sender then recalled the message and asked for recall notification to see whether the recall is successful or not. What’s the consequence of this recall? Every member of the DL (or more precisely, every Outlook client) sent a recall notification to the sender, that is the DL itself! Assuming the number of members in the DL is 1500 (the actual number should be bigger than this), the total number of message generated was 1500 * 1500 = 2.25M.

The entertaining value of this message recall incident is not less than that of the Bedlam event. Here are some of the funny emails from people who got flooded with the recall messages:

– Whoever is testing denial of service attacks using Outlook 2007 was successful. Congratulations, we’re prepared to hear your talk at BlueHat.

– The most noteworthy that happened since the BEDLAM days is that we learned how to delete mails faster.

– By Design.

General Network Error in .Net Framework 1.1

Under some circumstances (such as with certain network bandwidth), you might get a general network error in .Net Framework 1.1 when executing a query that takes longer than SqlCommand.CommandTimeout value.

This has something to do with how SqlClient 1.1 deals with timeout. Basically we track the timeout value by decrementing it each time after we make a net library read() call. And it’s possible that the value goes underflow. Since we always cast the timeout value to a UInt16 before passing it to the net library read() call, if it happens to be -1, we’ll pass in a UInt16.MaxValue. In the net library read() call, a timeout with UInt16.MaxValue has special meaning: return immediately. If no packet is received, a general network error (GNE) will get thrown.

Fixing the problem will have some unintentional effects on every other customer. Queries that are succeeding today, due to the above timeout logic, might begin failing.

The temporary workaround is to change packet size to a smaller one so that the read() call could have enough time to read a packet before hitting the return immediately case. The ultimate solution is to move to .Net 2.0 as we adopt a completely different approach in implementing the timeout logic.

Internal connection fatal error when trying to run a UNION ALL query or to call the DeriveParameters method

http://support.microsoft.com/kb/913764/

The KB article didn’t say a lot of details about this bug. Usually the following conditions have to been met in order to trigger the bug:

1. Query must run against sql2000 (Sql2005 works fine)
2. The table name has to include 3 or more dots, ie: [A.B.C.D.E]
3. There has to be an image, text or ntext column in the table

It has something to do with the way we parse table name in SqlClient and the above KB article has provided a fix.

High CPU use of SqlClient

There was a bug in SqlClient which could cause high CPU use.

The scenario is a SqlClient application sends an attention to Sql Server but receives no acknowledgement from the server because the connection is broken.

The bug lies in the logic of reading server’s response from the wire. We buffer the incoming data from the wire. In the above scenario, we read from the beginning of the buffer over and over when there’s no more data coming from the wire as the connection is shut down. Basically it triggers an infinite loop.

The fix can be downloaded from:

http://support.microsoft.com/?kbid=912731

System.Data.SqlClient.SqlException 3988

Exception Type: System.Data.SqlClient.SqlException
Number: 3988
Message: New transaction is not allowed because there are other threads running in the session

If you have ever seen this type of exception when talking to RTM SQL Server 2005 (but never saw it with the Beta version of the server), it’s likely that you have an open data reader associated with the connection when starting a new transaction.

The change (from Beta to RTM) is by-design. To avoid non-determinate transaction state, the request of a transaction should be the only request when you execute a transaction. In other words, you have to close the reader associated with the connection before starting a new transaction.

Resources

There’re a lot of resources online talking about ADO.NET:

1. Data Access and Storage Developer Center: official Data website at microsoft.com.

2. Data Access Blog: team’s blog site. You may find information interesting from different team members.

3. .NET Framework Data Access and Storage – MSDN Forums: it’s a place where lots of developers are asking questions about ADO.NET and System.Data namespace. Some team mates (including me) are answering questions there. It’s good to get feedback from our customers (developers) and help them. However, please do read the FAQ and announcements before posting a question.

4. Channel 9: MSDN channel 9 on ADO.NET. You might find some interesting videos there.

5. Search engine is always helpful:)