Bluehat message recall incident: flashback to Bedlam DL3

If you don’t know what Bedlam DL3 means, I encourage you to read this blog entry You Had Me At EHLO… : Me Too!

Yesterday, a similar event happened, in a smaller scale though.

Someone sent an announcement email to the Bluehat Alert DL with more than 1000 members at the company. The email was actually sent from the DL itself.

Not long afterwards, somebody replied saying that the url in the announcement message was incorrect.

The sender then recalled the message and asked for recall notification to see whether the recall is successful or not. What’s the consequence of this recall? Every member of the DL (or more precisely, every Outlook client) sent a recall notification to the sender, that is the DL itself! Assuming the number of members in the DL is 1500 (the actual number should be bigger than this), the total number of message generated was 1500 * 1500 = 2.25M.

The entertaining value of this message recall incident is not less than that of the Bedlam event. Here are some of the funny emails from people who got flooded with the recall messages:

– Whoever is testing denial of service attacks using Outlook 2007 was successful. Congratulations, we’re prepared to hear your talk at BlueHat.

– The most noteworthy that happened since the BEDLAM days is that we learned how to delete mails faster.

– By Design.

General Network Error in .Net Framework 1.1

Under some circumstances (such as with certain network bandwidth), you might get a general network error in .Net Framework 1.1 when executing a query that takes longer than SqlCommand.CommandTimeout value.

This has something to do with how SqlClient 1.1 deals with timeout. Basically we track the timeout value by decrementing it each time after we make a net library read() call. And it’s possible that the value goes underflow. Since we always cast the timeout value to a UInt16 before passing it to the net library read() call, if it happens to be -1, we’ll pass in a UInt16.MaxValue. In the net library read() call, a timeout with UInt16.MaxValue has special meaning: return immediately. If no packet is received, a general network error (GNE) will get thrown.

Fixing the problem will have some unintentional effects on every other customer. Queries that are succeeding today, due to the above timeout logic, might begin failing.

The temporary workaround is to change packet size to a smaller one so that the read() call could have enough time to read a packet before hitting the return immediately case. The ultimate solution is to move to .Net 2.0 as we adopt a completely different approach in implementing the timeout logic.