Wednesday, 2 February 2011

Letting yourself go

This week I dealt with a support issue from a customer running a live implementation. The problem was that they were seeing a sporadic change in the behaviour of the system in querying certain international characters. On examination of the problem I realised that it was an occurrence of a bug that I had identified previously. I found it during release of a custom option we created for the customer to support their data validation utility.

My notes on the bug detailed my findings and an assessment of the scope of the problem. Based on the testing that I had performed the problem appeared to be limited to a very specific scenario relating to the use of the new option. At the time I had added this limitation to the release note and made the customer aware. They accepted this as a limitation and I felt that this was sufficient to allow releasing with the issue in place.

On discussing the recent occurence of the bug with the customer it became apparent that they had not used the custom option in the implementation in question and were still hitting the problem. I carried out some more extensive tests around the area and recreated the problem in an scenario where the option had not been used. It occurred less frequently than if the option had been applied, but was definitely present.

Looking back at the time when we released the option, and the work climate at the time, I can understand why this situation happened, and I think that it provides some valuable lessons:-

Testing doesn't give absolute information on scope of issues

Just because a problem can be recreated consistently under certain circumstances and not under others does not mean that the issue is limited to those circumstances. In this case the issue was recreatable quickly once the custom option had been used, but it still existed, albeit less frequently, on sessions where this option had not been applied. It was the same issue. Granted this is an unusual situation but I should still have assessed the risk of the problem on the basis that it could occur anywhere, not just in the scenario in which I had managed to recreate it.

Don't submit to confirmation bias because you are under pressure

In this case the customer was putting pressure on to deliver a solution. I had identified the problem in question, however I too quickly submitted to the idea that the use of the option caused the problem and my subsequent tests were biased towards confirming this rather than disproving it. I felt that documenting the limitations on using the option was sufficient. In hindsight I doubt that I would have come to this conclusion had the bug been found testing a major release rather than an custom patch which we were under pressure to deliver.

It is never possible to identify all of the bugs present in a software release. It is rarely possible to fix all of the issues that we do identify. Our ability to assess the risks posed by known issues in the system and prioritise fixes is limited by the information that we have on the nature and scope of those issues. Whatever the circumstances of releasing a piece of software, the standards of information we gather and assessments we perform should be consistently high. No matter how much pressure is on, the customer will be thankful for it in the long run.

Copyright (c) Adam Knight 2011

No comments:

Post a Comment

Thanks for taking the time to read this post, I appreciate any comments that you may have:-