Plastering over the cracks - Why fixing bugs is missing the point

A few years ago I was fortunate enough to travel through China and take a tour up the Yangtze river, passing the new Hydro-electric dam that was in the process of flooding the famous "Three Gorges". As our (government operated) tour boat navigated the giant locks up the side of the dam our guide informed us that "the cracks that have been reported on the BBC news have been fastidiously fixed". My gut reaction this statement was a feeling of mild panic, yet I knew that the cracks had been repaired - what was my problem? My confidence in the integrity of the dam was massively impacted by the fact that I knew that the apparent fault had been fixed, yet I had no confidence that the underlying problem had been understood. I was travelling on a large body of water which was only being prevented from rushing down the valley below was a lump of concrete that a few weeks ago had cracks in and had no evidence that the government had any idea why those cracks had occurred in the first place. What was to stop the fault that had caused those cracks cropping up in another part of the dam?

Planned Failure

More recently I was reading a discussion group on the subject of what testers felt were the most useful/useless statistics to a testing process. One of the figures suggested was that of actual versus predicted bug rates, the idea being that developers predicted the likely bug rates for a development and then progress was measured on how many bugs were being detected and fixed compared to this "defect potential". I dislike this concept for many reasons, but the foremost of these is exactly the same reason that my subconscious was nagging me on the Yangtze river:-

Just fixing defects is missing the point.

A bug is more than just a flaw in code. It is a problem whereby the behaviour of the system contradicts our understanding of a stakeholder's expectation of what the system should do. The key to an effective resolution is understanding that problem. Only then can the appropriate fix be implemented. I believe that the benefit that can be obtained from the resolution of a bug depends hugely on the understanding of the problem domain held by the individuals implementing and testing the fix.

Factors such as when and how the issue is resolved, and who implements and retests the fix, can impact this understanding. Even the same bug fix applied at a different time by a different person can have an impact on the overall quality of the software through the identification of further issues in one of the feature domains. If the identification or resolution of issues are delayed, either in a staged testing approach, or through the accumulation of bugs in a bug tracking system in an iterative process, then the chances of related bugs going undetected or even being introduced are higher.

While the Iron is Hot

Many factors that can influence the successful understanding of the underlying cause of a bug are impacted hugely by the timing and context in which the bug is tackled.

Understanding of the Problem Domain
Understanding of Solution Domain
Knowledge of related features
Understanding of Testing Domain

this post

By operating with fast feedback and reslution cycles we take advantage of the increased levels of understanding of the problem, solution and testing domains affected, thereby maximising our chances of a successful resolution and identification of related issues. If a software development process embraces the prediction, acceptance and delayed identification and resolution of issues then many of the collateral benefits that can be gained from tackling issues in the context in which they are introduced are lost.

Copyright (c) Adam Knight 2011 a-sisyphean-task.blogspot.com Twitter: adampknight

MaikNog said...: Nice read. I like stories, which connect "real" things (like the dam story) with the more "un-real" (not sooo haptic) world of testing.
I was intrigued by the mentioning of Sysiphos; wrote an article (http://hanseatictester.info/?p=23) with that content; more on a meta layer though.

Cheers,
MaikNog; 6 September 2011 at 14:13
Adam Knight said...: Thanks for the comment. Software testing is in many ways the interface between software development and the "real world" so I think relating it to other more tangible situations that elicit an emotional response can be very effective. In the case of the dam, a more extreme case of a common situation to us demonstrates the discomfort that we have with this approach, something which may be less apparent when working with the more familiar context of software bugs.; 6 September 2011 at 23:13