Skip to main content

Focussed data collection leads to root cause

Company Background

An American Company contracting in the aviation industry. They have contractors, located globally in different time zones and use a vendors software to manage project requirements.

Issue:

Following an upgrade of an application server, the vendor’s client finds that when viewing a requirements management project through a browser, the browser consumes 100% CPU and crashes. Because of the time zone differences, the issue was not well understood and lots of irrelevant data was being collected. The issue had been ongoing for 30+ days and the customer escalated it, demanding that someone in their local time zone work with them to resolve the issue.

Solution:

Technical support engineers engaged in a meeting with the client and used Situation Appraisal to better understand the concerns. Situation appraisal led to a clear problem statement and problem specification data was collected. The IS/IS NOT data collection revealed that only the most recent project created crashes. The client felt that the cause was the most recent upgrade, however while analysing the “when data” it was determined that there was a 2-week gap between the most recent upgrade and the first occurrence of the problem. I.e. They experienced no such problems in the first 2 weeks after the upgrade. The technical support engineers then asked to see the developer tools of the browser when it was working as it should. This allowed them to identify that a script was being run and called only once. When looking at the same information for the problematic project, they noticed this same script was being called endlessly, which caused the browser to eventually use 100% CPU and crash. The cause of the high CPU usage was found, however in order to correct the issue, they asked the client to look closely at the script and immediately they identified the issue with it. The script was modified, which resolved the issue. Beyond that, in future, any updates to the script, will follow a process of validation prior to production to prevent recurrence.

Impact:

For 30 days the organisation experienced delays in starting new projects, leading to much frustration and confusion for team members and loss of confidence in the application.