A Tale Of The Unhandled Workflow Exception

Workflow Foundation (WF) catches the unhandled exceptions of any workflow instance that it’s charged with running. Upon catching the unhandled exception of a workflow instance, WF terminates it and raises a WorkflowTerminated event where it generously includes the exception in the event arguments. At first glance this seemed like a reasonable approach. After all, one doesn’t want a sloppy and poorly crafted workflow taking down an entire service and jettisoning hundreds of other smoothly executing workflows. <tears> Unfortunately my initial enthusiasm was soon dampened by a series of conversations with Jeffrey Richter. Jeffrey convinced me that my initial thoughts were dead wrong, and that the design chosen by the WF team was inherently deficient in this area. </tears>

The basic problem boils down to this: WF isn’t impervious to the perils of unhandled exceptions any more than other managed code that we write. The fact that we’ve placed the WF runtime in charge of managing multiple workflows doesn’t excuse it from this priority rule; to the contrary, it would seem to exacerbate the problem. The fact that the runtime runs all of workflow instances in its own AppDomain seems to seal the deal. At the end of the day, workflows are compiled and executed as machine instructions under the management and control of the CLR. An unhandled exception to a workflow is exactly the same as an unhandled exception to a C# program. The perils of catching all unhandled exceptions are well known and can be found in Jeffrey’s CLR via C# book and many other sources, so they aren’t repeated here.

It would seem then that the only choice available to the WF team to have written a more resilient workflow runtime then would have been to have created a separate AppDomain for each workflow instance so that AppDomains with unhandled exceptions could be terminated without impacting other workflows or the WF runtime. This leaves me feeling slightly nauseas because the performance implications of adding this level of overhead would have undoubtedly been (dare I say) very noticeable; however, I can think of no other option with managed code to deal with unhandled exceptions and still play-by-the-rules. While I was no “fly-on-the-wall” when WF was being designed, I can well imagine that the workflow team was forced to choose performance over the other harder-to-measure and easier-to-brush-under-the-carpet objectives. And that is the end of tonight’s tale of the unhandled workflow exception.