Debugging Windows 2000 COM Applications - The Problem (Page 2 of 4 )
Server ”II” had interesting behaviors corresponding with failure: a) Server ”II” and ”I” had identical code-bases and Server ”I” had no problems, b) No Application or System messages in logs, c) performance monitor displayed low CPU and RAM usages, and d) the desktop interface was completely normal (no lockup, system messages or Dr. Watson messages).
To review the application configuration: Server ”II” and ”I” were configured to serve website pages in a coordinated effort. The servers achieve balanced delivery of web pages to users under Windows 2000 Advanced Server Load Balancing (WLB). Additionally, the application situation may be described as Server ”I” is the primary server in WLB and delivering 100% uptime. Server II, a secondary server in WLB, is giving 60% uptime and daily requires a staff administrator to periodically resolve “Server Errors” manually.
Servers ”II” and ”I” are equal in hardware, OS configuration, IIS configuration and application code”.
To get to a resolution, we started from the bottom and moved up: Hard diagnostics “ran good”, OS comparison “checked”, IIS Configuration and application code “duplicated on all machines”.
Still, there were problems and we worked on these several key hurdles to resolution:
Server ”I” had no apparent problems and it was thought that the issue was within the WLB configuration. We uninstalled “Compaq Network Teaming and Configuration” on all servers and tested. (This seriously misdirected our resolution efforts.)
After “Compaq Teaming” had no effect, we found Server ”I” was handling most “external traffic”. Server ”II” handled most “Internal traffic” and had very low levels of usage during nights and weekends. Server “I” did have an error rate (it was much lower) and we had assumed (incorrectly) there was no error on Server “I”. Cost in time to figure out WLB was not the problem = 3 weeks.
The admin staff was directed to navigate to different URLs of extensions (.htm, .html, .gif, .jpg, .asp) on Server II when it was not responding. It was found that Server ”II” responded to all (.html, .jpg, etc.) but .asp extensions.
Originally the problem was “fixed” by rebooting the server. We researched and found that manually stopping the starting the Out-of-Process components in COM+ resurrected the server.
The DBA imported the web logs to a SQL table and ran queries to ferret out the .asp extension scripts correlating with time of server error. He found this error “Out-of-process+ISAPI+extension+request+failed” or “|-|ASP_0113|Script_timed_out” or “|-|ASP_0148|Server_Too_Busy” in the cs-uri-query column.
The error messages and server characteristics were interesting as it was similar to what a server would display with a “Code Red” virus. We explored this and found that GET params in the URL of a virus are different. Looks like “/default.ida?NNNNNNNNNNNNNNNNNNNNNNNN%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090% u6858%ucbd3%u7 801% u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u0”
The Admins downloaded iistools.zip and ran <<vbchkw2k.zip>> checking the “Retain in memory” and “Unattended” properties of each custom DLL loaded in Server and Library processes. Some DLLs reported these properties as not configured correctly: “ProgID: UIO01JOB.GetJobSQL / Retained In Memory :OFF / Unattended Execution :OFF / K:\Server Components\UIO01JOB.DLL <<< Please Recompile!”
Here is the resolution: Recompile all custom DLLs reported in <<vbchkw2k.zip>> to have the 'Retain In Memory' property set ON. Q264957: “Visual Basic DLL Has Memory Leaks and Crashes in COM+ If 'Retain In Memory' Is Not Set”. Quoted from MS “When these options are not selected, the Visual Basic runtime unloads custom and runtime DLLs unexpectedly, which causes the computer to stop responding (crash or hang) under some multithreaded scenarios. Typical scenarios include placing the ActiveX DLL in COM+ or Microsoft Transaction Server (MTS), or calling the ActiveX DLL from ASP pages.”
This answer was hard to find because Server ”I” was handling most “external traffic”, therefore, did not allow IIS to unload our custom DLLs because they were called regularly throughout a 24 hour period. Server ”II” handled most “Internal traffic” and had very low levels of usage during nights and weekends. Seemingly, at random intervals, IIS unloads the custom DLLs and the ASP engine dies. Only a detailed review of the Server “I” web log allowed us to see the error occurring on both servers and move beyond the “just one server is malfunctioning” mindset.
Next, I have listed two documents we used to resolve this issue. 1) Server II Not Responding – formally documents server information before we attempted any solutions. Item 2) Server II Debug Process – documents the debug steps. These document outlines were valuable in accurately and logically following the issue to its resolution.