When I was a developer (I mean employed as a developer, even now I develop applications, don’t get me wrongJ) before joining Microsoft, I used to get stuck with application issues; whether its process crashing, process not responding or high memory usage. There were not many options for me at that time but to review code for that page or form and figure out the cause myself.
If there was a tool which I can use to tear the process and see what is going on inside the process I would've saved a lot of my hours (or maybe days). Although Windows Debugging Tools (WinDBG) was available, it was not too easy to learn the commands or understand how all those stuff works when you have development timelines/deadlines (or is it death lines?) to be met.
Not many developers are aware of what kind debugging I’m gonna talk about. When ever we talk about debugging people assume that it’s live debugging using Visual Studio or so and put breakpoint and walking through the code. Huh! This would be a blessing if we can do the same thing in case of production servers, but on production server applications it’s a completely different ball game.
Think of a situation where customer has an issue on production box with IIS process
(Here IIS process is used just for illustration but below explained issues are true with any other multi-threaded process/service)
My Options (being little bit sarcastic)
- Send windows source code to customer and we will install Visual Studio to walk through the code and find what the issue is” You know I would loose my job. J
This kind of debugging works mostly for client applications
- What happens if I don’t really know when the issue happens?
I can employ a person who will sit in front of the server 24x7 watching for the issue to happen J
- What if the issue happens only for sometime and the issue vanishes?
By the time my monitoring person yawns, the issue would vanish J
- What happens if the issue only happens when a specific user sends a post request with some specific string in there which leads the IIS process to get stuck and block all requests?
Start writing “Debug.Print” or “Response.Write” kind of tracing to find out where it gets stuck. You might not finish your project anywhere in near future J
- Customer called on my mobile and is screaming because my website is not responding.
I need to run to the server (maybe even drive, since the server is in a remote place) to take a memory dumps.
All the above options are provided to understand how tools like DebugDiag help us to automate and make our life far better.
DebugDiag or Debug Diagnostic Tool is not the 1st tool but as far as I know would be the 5th generation of tool for doing post-mortem memory dump analysis. Most of those previous tools were were exclusively used by PSS and were not available on Microsoft Downloads.
Post mortem debugging simply means that we take a snapshot of the process memory when the issue happens and use either DebugDiag or WinDBG to figure out what was going on inside the process when the issue happened and find out the cause for the issue. Since this is technically challenging, it takes a lot of time. Some of our customers think that it's like looking into iislogs to find the request. Let me tell you that its mostly digging deep into thread stack, heap and other memory areas to find out what might’ve lead to the issue.
What is Debug Diagnostic Tool?
DebugDiag is a post mortem debugging tool which has analysis capabilities, so in simple words there is 3 parts for this tool.
- Capture memory dumps for different types of issues (Hang/Crash/Memory)
- Run basic analysis on the captured dumps and generates a report to understand the results. It also provides very good pointers to issues mostly for expert eyes.
- Exposes an object model which can be used to easily access the information available inside the memory dump file (memory dump file extensions are usually DMP / PDMP / MDMP)
What are the main components?
- Debug Diagnostic Service (dbgsvc.exe)
- DebugDiag UI (debugdiag.exe)
- DebugDiag Host (dbghost.exe)
Debug Diagnostic Service
This is the service which is the heart of DebugDiag. Why should it be a service? In the past we used AD+ (Auto Dump+) for troubleshooting most of those debug scenarios. AD+ is executed from the command line and client side program. This simply means that it runs under the context of the logged-in interactive user.
Let’s take an example. Assume that we are trying to track an issue which happens intermittently, say for example process crash and it happens once in a week or month.
So we setup AD+ (KB 286350) from command prompt. Since this tool runs from command line if you logout from the console AD+ stops monitoring. So if your organization has multiple administrators who look after the server they need to be informed not to logout from the console till we track and get a good set of dumps for the issue. This becomes extremely difficult specially because we find out that someone did a logout only after the next issue occurrence and by then its too late. Then we start monitoring again and sleep till we get another repro. Keep in mind, in some cases a repro might take seconds, minutes, hours, days, weeks or even months.
Another issue with AD+ like tools is that you cannot use it through Terminal Service sessions which most of those administrators are too used to J
AD+ provides a lot of customization options and its powerful in that way, and it was “the” tool we used in the past (and I see people using it even now).
To get around the above mentioned issues DebugDiag runs as a service as “Local System” so that it’s not dependent on the logged on interactive user session.
So how do we configure this service since windows services cannot have UI?
DebugDiag user interface is used to create rules for capturing different types of issues by creating rules and also the interface to run the analysis portion of the tool.
DebugDiag like I mentioned before (did I mention?), has a scripting host built-in using which we can customize and extend the features according to the requirements. The main script file called “DbgSVC.vbs” (we call it as controller script) is present in scripts folder inside installation folder. This script gets modified when you make changes in the UI related to Hang or Memory Leak rules. This script file contains (or exposes) some events which you can use to extend and customize the working of DebugDiag.
Open the Controller Script (“DbgSVC.vbs”) in notepad and see for yourself.
Rules are nothing but simple way of configuring DebugDiag to work according to your requirements for specific scenarios. Rules contains information about the location where you want the memory dumps files to be stored etc and also contains
Events you can further use. For example if you create Crash Rule, DebugDiag creates a script file called “CrashRule_IIS.vbs” in the scripts folder.
Open the Crash Rule Script (“CrashRule_IIS.vbs”) in notepad and see for yourself.
Now with DebugDiag you loose the functionality like we had in AD+ to run it from command line. Do we? Not really! Continue reading…
So how do I know what’s available under the hood?
Go to Command prompt and type
C:\>dbghost /? (Obviously you should try from the installation folderJ)
I can analyze the dumps myself? Oh really?
DebugDiag provides analysis feature which you can use from the DebugDiag UI tab called “Advanced Analysis”. By default, right now we have scripts available for analyzing “Crash/Hang Analyzers” and “Memory Pressure Analysis”.
Analysis Scripts are nothing but .ASP pages inside “Scripts” folder which uses somewhat ASP kind of scripting style and uses VBScript to iterate through the structures inside the dumps (which is nicely exposed using an object model) and try to find out known issues or easily identifiable issues so that for simple issues troubleshooting can be done by yourself without calling MS PSS.
More to come which includes Script customization, specific steps to be taken for scenarios like Hang/Crash/Memory related issues etc...