Sunday, 8 December 2013

Elementary, my dear customer

One of my personal traditions as winter approaches in England is sitting in front of the fire and watching one of the many excellent dramatisations of the classic story, The Hound of the Baskervilles. Having read the complete Sherlock Holmes repeatedly when I was younger the characters and plots have a comforting familiarity when the weather outside turns to spiteful. One of the most famous literary characters of all time Holmes, as I'm sure you are aware, uses the application of logical reasoning in the investigation of crime.

I find that exactly the same processes of logical and deductive reasoning are also invaluable to both the software testers and technical support agents that I work with when performing some of the more challenging aspects of those activities, for example when trying to establish what might be causing bugs that have  been observed in our software and systems.

More information than you know

'Pon my word, Watson, you are coming along wonderfully. You have really done very well indeed. It is true that you have missed everything of importance, but you have hit upon the method, and you have a quick eye for colour. Never trust to general impressions, my boy, but concentrate yourself upon details" (A Case of Identity)

One of the characteristics of Holmes deductions is that he does not make great inspirational leaps, what appear to be fantastic demonstrations of deduction are made capable through simply observing details that others miss that provide a wealth of information when a process of logical reasoning is applied to them. It is my experience that those that achieve the best results in investigations are the ones that take the time to really understand the information available and what they can deduce from it.

One of my team recently asked me for some help in knowing where to start when trying to investigate a software problem. My answer was, on the face of it, quite trite:

"Write down everything that you know".

Whilst this seems childishly obvious, the point that I was leading to was that if we examine and document all of the facts at our disposal, "what we know" can actually be a much larger set of information, and get us a lot closer to an explanation, than might first appear possible. By collating all of the information that we can identify from the various sources of available to us into a central set of notes helps to organise our knowledge of a situation and ensure that we don't overlook simple details or commit too early to conclusions without look at all the available data. I, and very capable individuals that I have worked with, have come to erroneous conclusions on problems simply because we commited to a diagnosis too early before examining the available information in sufficient detail to form a conclusion.

"It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts." (A Scandal In Bohemia)

Story 1

A simple example presented itself recently. We had a failure that had occurred repeatedly across a cluster of servers over a period of continous running and one of my team was investigating the occurrences of the failure. I'd asked him to examine the occurrences of the failure in the logs to see if he could identify any relationships

They were:-

10/21 2:01

10/21 17:39

10/21 18:39

10/22 8:41

10/22 9:41

10/22 13:42

His first thought was that there was no relationship between the times, yet if we exclude the anomaly of the first reading we can see that all of the occurrences are at around 20 minutes to the hour. The pattern may look obvious when presented as above but when lost in a large set of log files it is easy to miss. Simply by taking the information out and looking at the relevant values in isolation does a clear pattern emerge. When I highlighted the relationship to him he re-examined the logs in light of this pattern and established more information on the first value, identifying that it was due to an unrelated and explainable human error. The remaining entries that fitted the pattern were all caused by the problem we were investigating.

Write down everything that you know:-

"All of the failures relating specifically to the problem at hand occurred around 20 minutes to the hour. The time of the failures gets approximately 1 minute closer to the hour with every 12-14 hours. One other failure occurred during the period which the logs show to be due to human syntax error."

By establishing that there was a temporal element to the occurrence of this issue we established that it must be affected by a process with a time component or cyclical process, most likely operating on an hourly loop. Based on this information we were able to correlate the occurrence of the error with the hourly refresh of a caching service.

Story 2

In a similar vein on investigating a query failure that occurred over a weekend one of the team pulled the logs across the whole weekend for the log message in question. On looking at when the problems occurred it was immediately apparent that all of the failures occured just before twenty-past one in the morning.

Server:192.168.0.1|2013-10-21 01:18:01 -0600|3|Task:12-2013102301180103210231|A query error occurred ...
Server:192.168.0.2|2013-10-22 01:19:21 -0600|3|Task:12-2013102301180103210231|A query error occurred ...
Server:192.168.0.1|2013-10-22 01:19:23 -0600|3|Task:12-2013102301180103210231|A query error occurred ...
Server:192.168.0.3|2013-10-23 01:19:52 -0600|3|Task:12-2013102301180103210231|A query error occurred ...

The immediate conclusion was that all had been impacted by the a single event. On re-examining the timings and looking at the dates of the failures in addition to the times, it was clear that the failures had actually occurred at similar time, but on different days. Not only that, they had affected different servers. This may seem like an obvious detail, but if you are looking at a list of consolidated error log entries and see multiple concurrent events with matching times, it is very easy to unconsciously make the assumption that the dates will also be the same and not actually confirm that detail.

Write down what you know

All of the failures occurred between 1:18am and 1:20am server local time, which is UTC + 6 hours.The failures occur on separate days on separate servers in the cluster.

Based on this information we could infer that the problem was being caused by a process common to all machines, or a process external to yet shared by all of the machines. either way there was clearly some timing element to the issue which made the problem more likely around 1:20 am and occurred on all three days. We were able to provide this information back to the customer who were then able to investigate the virtual environment with their infrastructure team on this basis.

The power of Contraposition

"when you have eliminated the impossible, whatever remains, however improbable, must be the truth". (The Sign of Four)

This is perhaps the most famous Holmes quote of all, and refers to the indirect deductive approach of Contraspositive Reasoning. This is the process of deducing that an event has not occurred by establishing a contradition to an antecedent of that event, i.e. establishing that a logically necessary condition for that event has not taken place. When faced with a failure, the typical behaviour (in my company at least) for all of the individuals concerned will be to offer theories and ideas about what may be causing that failure. This can result in a number of plausible theories on the problem which can make narrowing down to the likely problem very difficult. By examining all of the information available both to identify what we know is the case, but also in the light of what that information tells us isn't happening, we can make far more accurate inferences about the nature of a problem than could be apparent by taking the information at face value.

Story 3

A couple of years ago I was investigating a customer problem and struggling. I'd spent over a week collecting information and could still not identify the cause of the issue. Some of the key points were

  • A problem occurred running a job through a specific tool, however a different tool running the same job via the same connection ran successfully and other smaller operations through the original tool were also OK.
  • The problem appeared to result in processing files in our processing queue being trucated or corrupted in some way

Myself and three of the other team members sat in a room and literally mind-mapped everything that we knew about this process. Over the next hour, as a team, we established the facts and made a series of new deductions on the behaviour:-

  • One suggested another process corrupting ('gobbling up') some of the files - we established that the files were missing entire written sections. If something was affecting the files post write then an expected outcome would be that the files were truncated at random points rather than these clean section endings
  • We discussed a common failure of each parallel process creating the files. Each file had the same modified date yet a different size. If the write process was failing on each write separately then we'd expect different modified dates and probably more consistent sizes, so we deduced that one event was affecting all of the files.
  • We discussed an event that could have occurred at the point of failure. On examination the last write time of the files was well before the problem was reported. If the problem was caused by an event at the time of failure then we'd expect to see matching write times, therefore the actual problem was taking place earlier and the files were taking time to work their way through the processing queue and cause the failure event to be reported

Through this group session of logical reasoning we elminated other possibilities and established the problem was related to running out of space in a processing queue. This seemed unlikely given the configuration, however on the basis of our deductions we probed the other information that we had on the configuration and identified that the customer installation had been copied from a much larger server without being reconfigured appropriately for the new environment. The disk space provisioned for the queue on the new machine, combined with the the slow ingestion of results by the client application, was causing the exact behaviour that we established must be happening.

None of these deductions required any real time tracing or extra debugging. All that we had was a recursive listing of the files in the queue and a couple of example files from the failing process. By taking each hypothesis on what could have caused the problem and used the information available to prove the absence of a logical outcome of that cause, we could disprove the hypothesis and narrow down to what remained, which had to be the truth.

A role mode for Testers

Holmes is one of my great fictional heroes and it hugely rewarding that exactly the same processes of logical and deductive reasoning that are made so famous in those novels are also invaluable to both the software testers and technical support agents that I work with in performing their work. In fact, when providing some mentoring to one of my team recently I actually recommended that he read some Sherlock Holmes novels to gain inspiration in the art of deduction to help him to track down bugs.

I'm aware that I'm not the first to make such a comparison, yet all too often I am on the receiving end of requests for information or erroneous deductions from other organisations because the individual in question has not examined fully the information available to them. In too many testing cultures that I have encountered it is rare for the individual raising an issue to have made any attempts to apply a process of logical reasoning to establish what that information is telling them. With the cheapness and immediacy of communication some find it easier to fire the immediate 'end behaviour' of issues to others rather than taking the time to establish the facts themselves.

One of the things that I love about my organisation is that I, and the people that I work with, will always strive to fully understand each new situation and use the information available, and their own powers of logical reasoning, to their best advantage to achieve this. Sadly we'll never have Sherlock Holmes working for us as a software tester, I believe that having a culture of attempting to use the same skills that make the fictional detective so famous is the next best thing.

Sunday, 1 December 2013

Potential and Kinetic Brokenness - Why Testers Do Break Software

I'm writing this to expand on an idea that I put forward in response to a twitter conversation last week. Richard Bradshaw (@friendlytester) stated that he disliked saying that testers "break software" as the software is already broken. His comments echo a recent short blog post by Michael Bolton "The Software is already broken" . I know exactly what Richard and Michael are saying. Testers don't put problems into software, we raise awareness of behaviour that is already there. It sometimes feels that the perception of others is that the software is problem free until the testers get involved and suddenly start tearing it apart like the the stereotypical beach bully jumping on the developers' carefully constructed sandcastles.

I disagree with this statement in principle, however, as I believe that breaking software is exactly what we do...

Potential and Kinetic Failure

I'm not overly keen on using the term 'broken' in relation to software as it implies only two states - in all but the simplest programs, 'broken' is not a bit. I'm going to resort here to one of my personal dislikes and present a dictionary definition - what I believe to be the relevant definition of the word 'break' from the Oxford Dictionary:-

Break: - make or become inoperative: [With subject] he’s broken the video

The key element that stands out for me in this definition is the "make or become" - the definition implies that a transition of state is involved when something breaks. The software becomes broken at the point when that transition occurs. I'd argue that software presented to testers is usually not inoperative to the extent that I'd describe it as broken when we receive it for testing. I believe that a more representative scenario is that the basic functionality is likely to work, at least in a happy path scenario and environment. The various software features may be rendered inoperative through the appropriate combination of evironment changes, actions and inputs. In the twitter conversation I likened this to energy:-

It's like energy. A system may have high 'potential brokenness', testers convert to 'kinetic brokenness'

What we will do in this case is to search for potential for the system to break according to a relevant stakeholders expectations and relationships with the product. In order to demonstrate that potential exists we may need to force it into that broken state, thereby turning this potential into what could be described as kinetic failure. Sometimes this is not necessary, simply highlighting the potential for a problem to occur can be sufficient to demonstrate the need for a rework or redesign, but in most cases forcing the failure is required to identify and demonstrate the exact characteristics of the problem.

Anything can be broken

In the same conversation I suggested that:-

Any system can be broken, I see #testing role to demonstrate how easy/likely it is for that to happen.

With the sufficient combination of events and inputs, pretty much any system can be broken in the definitive sense of being 'rendered inoperative'. For example if we take the operating factors to extremes of temperature, resource limits, hardware failure or file corruption. I suggest that the presence or not of bugs/faults depends not on the presence of factors by which the system can be broken, but on whether this combination falls within the range of operation that the stakeholders want or expect it to support. As I've written about before in this post - bugs are subjective and depend on the expectations of the user. Pete Walen (@PeteWalen) made this same point during the same twitter conversation:-

It may also describe a relationship. "Broken" for 1 may be "works fine" for another; Context wins

The state of being broken is something that is achieved through transition, and is relative to the expectations of the user and the operating environment.

An example might be useful here.

A few years ago had my first MP3 player. When I received it it worked fine, uploaded and played my songs, I was really happy with it. One day I put the player in my pocket with my car keys and got into my car. When I took the player out of my pocket the screen on the player had broken. On returning it to the shop I discovered that the same thing had happened to enough people that they'd run out of spare screens. I researched the internet and found many similar examples where the screen had broken in bags or pockets. It seems reasonable that if you treat an item carelessly and it breaks then that is your responsibility, so why had this particular model caused such feedback? The expectation amongst the experiences that I had read was that the player would be as robust as other mobile electronic devices such as mobile phones or watches. This was clearly not the case, which is why it breaking in this way constituted a fault. I've subsequently had a very similar MP3 player which has behaved as I would expect and stood up to the rigours of my pockets.

  • So was the first player broken when I got it? No. It worked fine and I was happy with it.
  • Who broke the first mp3 player? I did.
  • Was the first player broken for everyone who bought it? - No. My model broke due to the activity that I subjected it to. I'm sure that many more careful users had a good experience with the product.
  • Was the second player immune to breaking in this way? - No. I'm pretty sure that if I smacked the one I have now with a hammer the screen would break. But I'm not planning to do that.

The difference was that the first player had a weaker screen and thereby a much higher potential for breaking such that it was prone to breaking well within the bounds of expected use of most users. This constituted a fault from the perspective of many people, and could have been detected through the appropriate testing.

Any technology system will have a range of operating constraints, outside the limits of which it will break. It will also have a sphere of operation within which it is expected to function correctly by the person using it and the operating environment. If the touch screen on the ticket machine in this post had failed at -50 degrees Celcius I wouldn't have been surprised and would certainly not have written about it. The fact that it ceased working at temperatures between -5 and 0 degrees is what constituted a breakage for me due to the environment in which it was working. It wasn't broken until it got cold. It possessed the potential to break given the appropriate environmental inputs, which manifested itself in 'kinetic' form when it was re-installed outside and winter came along. Interestingly it had operated just fine inside the station for years, and would not have been broken in this way if it had not been moved outside.

Taking a software example, a common problem for applications is where a specific value is entered into the system data. An application which starts to fail when a surname containing an apostrophe is entered into the database becomes 'broken' at the point that such a name is entered. If we never desire to enter such a name then it is not a problem. At the point that we enter such data into the system we change the state and realise the potential for that breakage to occur. Testers entering such data and demonstrating this problem are intentionally 'breaking' it in order to demonstrate that potential exists in live use so that that decision can be made to remove that potential before it hits the customers.

State change can occur outside the software

You could argue that not all bugs include a change of state in the software as I describe above. What about the situation, for example, where a system will simply not accept a piece of data, such as a name with an apostrophe, and rejects it with no change of state in the software before and after this action. Surely then the software itself was already broken?

In this situation I'd argue that the change of state occurred not in the software itself but in its operating environment. A company could be using such an application internally for years without any issues until they hire an "O'Brien" or an "N'jai". It is at the point at which this person joins the company and they attempt to enter that employees details that the state of the software changes, from "can accept all staff names" to "cannot accept all staff names" and breaks. Given that testers are creating models to replicate possible real world events in order to exercise the application in and obtain information on how it behaves, the point at which we add names containing apostrophes to our test data and expose the application to these is the point at which we 'break' the software and realise the potential for this problem to occur.

As well as event based changes, such as that, breakages can also occur over time through a changing market or user environment, and our lack of response to it. Using the above hangul example, the potential for breaking increases dramatically the moment that our software starts being used in eastern markets. I won't expand on this subject here as I covered it previously in this post.

So Testers Do Break Software

I can understand why we'd want to suggest that the software was already broken. From a political standpoint we don't want to be seen as being the point at which it broke. I think that saying that it was already broken can have political ramifications in other ways such as with the development team. I'd argue that when we receive software to test, it is usually not 'broken' in the sense that it has been rendered inoperative and suggesting that it was may affect our relationships with the people coding it. Instead I think a more appropriate way of looking at is is that it possesses the potential to break in a variety of ways and that it is our job to come up with ways to identify and expose that potential. If we need to actually 'break' the software to do so then so be it. We need to find the limits of the system and establish whether these sit within or outside the expected scope of use.

If we have a level of potential breakability looming in our software as precariously as the rock in the picture above then the testers job to ensure that we 'push it over' and find out what happens, because if we don't exercise that potential then someone else will.

Sunday, 10 November 2013

The Implicit Waterfall

http://commons.wikimedia.org/wiki/File:Smoo_Cave_Waterfall,_Scotland_.jpg

There is more than one term for processes whereby teams embed a staged or "waterfall" process within agile and lean approaches. For example, 'ScrummerFall' and 'Mini-waterfall' are both expressions associated with Agile scrum anti-patterns which imply that teams are basically operating an 'over-the-wall' approach to handing work to testers during the sprint. The general concensus of community talks and testing discussions that I have been involved in is that testing should occur early, and throughout, the development of any new feature. Certainly in my organisation this is a critical aspect of our successful development process. From experience I know that it is a natural tendency to hold on to the trappings of processes that are familiar to us and the result can be that teams implementing the tools and rituals that support a methodology in a manner which preserves the trappings of their former processes. For example Whilst the visualisation boards that support agile and lean methodologies are very effective in supporting those approaches when used effectively, there are some fundamental structures implicit within common board layouts that I believe incline unwary teams towards these anti-patterns.

If we examine a common structure of scrum boards that is used for agile teams, for example the scrum boards presented by Agile guru Mike Cohn in his description of scrum boards, we see a series of columns representing statuses for each story/task:-

  • Todo
  • In Progress
  • To Verify/To Test
  • Done

The presence of separate columns for "In Progress" and "To Verify" seem to contradict the idea of concurrent testing activities. The reason that Mike Cohn gives for this column is where the task is sufficiently small that it does not merit a separate test card, and that for most stories a separate task card or cards would exist to cover testing activities. From conversations I've had with some testers, this distinction does not seem to have translated well to some teams. Another similar board format, the Kanban board, has the "In Progress" and "In Test" separation as a fundamental element of the board. The following example is from Wikimedia and most example boards I've found in a simple Google image search on "Kanban board" have a similar structure.

http://commons.wikimedia.org/wiki/File:Kanban_board_example.jpg

If we are suggesting that testing in agile/lean teams should be implicit throughout each user story then why is the verification of tasks both independent and mutually exclusive to their development?  I've had a couple of conversations recently with folks who were having trouble due to this separation. Their Kanban style boards incorporated this 'In progress' and 'In Test' distinction, which was introducing an implicit waterfall mentality in their approach. The developers were not handing over to testing until the coding was 'Done' and therefore missing some of the benefits of collaboration, putting pressure on the testing activities through delayed exposure. 

When introducing scrum boards to our team, I structured our boards slightly differently in a way that I thought would better complement our way of working than the examples I'd seen. I hadn't considered writing on this before as I felt it was very specific to us, however I was chatting with Dan Ashby ( a great enthusiastic tester) at Agile Testing days and showed him our approach and he suggested I wrote on it, so here we are.

The Symbol Based Board

When creating a scrum board for our team I felt that operating at the story level rather than the task level was going to be most appropriate for the team and the nature of work that we were tackling. One of the main things that I wanted to represent in the board was the concurrent test and programming status of the story. In most boards the status of any activity is represented by the position of a card representing that activity, I'll describe this as a "Position Based" layout. The approach that I tried was instead to use a "Symbol Based" approach to represent status. I created a column structure, with the first columns described the story and the people working on it. The next 4 columns could contain a status symbol for the 4 key activities around that story, these being Elaboration, Coding, Testing and Documentation.

Empty Board

I then created a colour and symbol coded set of status for each column. These being:-

Board Symbols

Each story was then added to a row on the board and we could represent the status of each activity on the story using the symbols. In this way we have been able to ensure that testing activity is started concurrently to testing activity, and can easily identify issues if any of the columns or individuals have too much in progress at any one time. We can also identify anti-patterns such as developers starting to code before we've elaborated acceptance criteria (this has been known to happen on occasion) and apply the appropriate symbol ("Do Now" symbol in elab column plus target for elaboration immediately).

Rich Text

The final columns on the board are for free notes on any specific targets for that story.

  • The first is a target, e.g. external targets such as if the story has to go on a specific branch or be demonstratable on a certain date, or internal targets on when we need specific activities done by in order to complete the others, such as elaboration or first exposure of working code to testing.
  • The second is for writing up any blocks, status notes or actions that need addressing. The ability to add these notes provides an ability to add reminders on previous conversations and a richer basis for discussion in stand-up meetings.
  • The final one is for any items that need to be raised to follow up with. This acts as a reminder to prioritise further stories we've added to the backlog arising from the work on that one or any bugs that may have been discovered late on in the sprint testing.

Ad-hoc tasks

Being a responsive organisation in a competetive market there are always ad-hoc activities that come in and need to be prioritised. To handle these we maintain a small Kanban style area at the bottom of the board for ad-hoc tasks with color coded cards for :

  • Bugs discovered that aren't related to stories in progress
  • Customer support investigations
  • POC requests

Kanban Area

If we have too many high priority ad-hoc tasks appearing on the board then we review the lower priority stories in the main grid and discuss removing some of these with the Product Owner to focus on the ad-hoc work that is coming in.

Examples

So a typical board in progress will look something like this (albeit with more stories - drawing these boards is surpisingly time consuming!):-

Example In Progress Board

and here is a real example from a previous sprint (note the absence of the calendar - this has been added since the photo was taken, the board is constantly evolving)

Real Example Board

Limitations

There are clearly limitations to this approach when compared to a position based task board

  • Not working at the task level - this might affect the ability to work at a higher level of granularity, however we find that working at the story level gives us a good level at which to discuss our sprint activities in the scrums.
  • Burndown - we don't generate a burndown from the board. To be honest we've not found that we need to - we operate at the story level and have a good inherent idea of our velocity as a team. The statuses allow us to identify any bottlenecks or items that are stalling. The Blocked and Paused statuses allow us to see the stories that are struggling and we can add an "AT RISK" comment in the notes area if we feel that a story may not be completed in that iteration. Additionally I have a good idea of testing activity required based on our charter estimates as I discuss in this post. If a burndown was necessary then a simple completed story count (or completed story points, if you are that way inclined) would be easy to implement.
  • Moving stories - as we write the stories on a white board we can't move them as easily as cards, however as we work in sprints ideally story content would be reasonably stable, and we do have the ability to add and remove items if necessary. If working with a continuous flow based approach rather than sprint cycles, then the horizontal slots could be restricted, and used to limit work in progress at a story level, adding new items only when one is completed and accepted. Again this would maintain focus on testing and development concurrently rather than separate activities.

We love our Board

So there you have it, my take on the scrum board, the Symbol Based Board. I find that it works very well for my organisation. I have previously attempted to address some of the limitations mentioned with the introduction of a more task-card based board, however the overwhelming response from the team in the retrospective was that they preferred the approach above. When we split into multiple scrum teams the newly created team created their board based on the exact same structure and have kept it since, with a couple of minor modifications.

My sister recently had her first baby. In discussing with her what to expect from others I explained that folks will give you advice in one of two ways, some will tell you what you should be doing, others will offer suggestions that have worked for them in certain situations. Please accept this post as intended, a form of the latter. I'm not suggesting that everyone uses this approach - I'm aware of limitations as compared to other approaches when well used. I'm posting this to provide alternative ideas for those who are struggling with their current boards and the implied practices that come with those structures. If your team is facing problems with an 'implicit waterfall' built into your team boards then I'd recommend trying out a "Symbol Based Board" approach such as this, it is certainly working well for us.

Monday, 21 October 2013

Variety is the Spice of Test Automation

When I shared our 'Set of Principles for Automated Testing' a few years ago, one of the key principles included was the separation of tests from test code. This is a principle which is used widely in test tooling and there are a number of test automation/living documentation approaches for which this notion of separation is an integral part of their operation

  • Keyword driven testing structures drive a test harness through parameter files which contain a bespoke set of commands to apply certain actions to the system under test.
  • Fitnesse drives automation 'fixtures' through a wiki documentation structure, often with parameter/result combinations written in tabular for containing definitions for individual checks
  • Atdd tools such as cucumber take this a step further and interpret natural language based commands into executable steps, again through test fixtures and centralised code

These approaches are designed, amongst other things, to simplify test creation and maintenance through avoiding the writing of new test code for each test. One could argue that any syntaxes that we use to define the tests, whether keywords or natural language, are a form of interpreted code. I'm inclined to agree but for the purposes of clarity I'll only refer to the language that is used to write the test harness or fixtures interfacing with the system under test as 'test code' in this post. By centralising test code we adhere to the principles of code reuse and DRY (don't repeat yourself) that apply in most programming endeavours. I can see the efficiencies that may be achieved by doing this, however I think that there are also inherent pitfalls that I think need to be considered if we're not to seriously restrict the flexibility in our automation.

  • Limiting to the solution design -  To achieve even greater efficiency, some folks recommend that the programmers writing the software also create the test fixtures and harnesses, leaving the testers to focus on test creation and definition. When a programming team designs a feature they will be working with a model of the solution in mind. This model will constrain the range of inputs, operations and outputs considered valid for the features of the designed solution. When designing fixtures to drive those features the natural bias of the developer will be to limit the scope of the fixtures to supporting what they perceive to be the scope of that model. I believe that the point of test code should be to allow testers to drive the system under test in a variety of ways. In order to do this effectively we should be free to operate in the problem domain, and explicitly aiming to discover areas where the problem and solution domains are inconsistent.
  • Limited Control - If, for example, the phrase 'when a user creates an account' triggers a test harness to execute a predictable process every time that conforms to the 'ideal' use of that feature, then the resulting tests are unlikely to provide a range of scenarios that is reflective of real use. The danger is that, by abstracting the testers from the code that interfaces with the product, through a natural language interface for example, we limit our flexibility in exercising the application in a range of ways that represents a more varied and realistic use. My preference is for the tools that I use to extend the reach of the tester to activities which would not otherwise be available to them. This will include predictable and stable workflows for regression testing but should also allow access to scale, volume, parallelisation and frequency of activity that would otherwise be unavailable without those tools.
  • Lack of Variety - With lack of flexibility there is also an implied lack of variety and randomness in the actions that the product is subjected to. Whilst known state and a measurable, checkable outcome are required for functional test automation, this needs to be balanced with the ability to add variety and randomness that increase the range of combinations tested and thereby the chances of exposing issues.

Providing Flexibility

So how to balance the need for predictable test output for checking with the need for supporting variety and scope in test harness capabilities? Here are a few approaches that I've found can be very effective:-

  • Use parameters to add flexibility - We have a very useful parallel threaded query test harness that we developed in house. In addition to the ability to select the queries to run at random, it is also possible to select one of a number of connection modes via an input parameter. These modes change the manner by which queries, statements and connections are used by the harness. This is achieved through use of a set of connection classes which expose a common object interface to the driving harness. In this way we can adhere to the DRY principles of reusing both the main harness code, and the test definition files, yet still provide flexibility in the interface between harness and application. The structure is extensible too, such that when a customer was using a Tomcat connection factory to connect we were able to add this in as a connection class and recreate issues in that connection structure without altering our existing harness or having to develop new test data.
    Variation through use of interfaces in a test harness

    Parameters to change test execution don't just apply in the initial connection. Building support for test parameters which change key execution paths of the driving harness can be applied throughout the software lifecycle to great effect. For example the testers working on our query system can control the optimisations and runtime options applied to all of the queries in a run of tests by the application of the appropriate parameters. This allows the execution of a suite of tests with and without specific optimisations applied to compare the behaviour.

  • Allow for randomisation - Whilst a test in itself must have predictable output in order to apply a binary check, there is no reason why that test must be executed in a consistent pattern in relation to the other tests. Executing tests in parallel, with a level of randomisation in the execution order, provides a much broader range of execution scenarios than the execution of the same test in isolation each time. The regression harness that I currently develop supports the ability both to schedule multiple concurrent packs of tests and also to randomise execution order and timing within those packs. This helps to increase our chances of detecting concurrency issues between processes which can heavily depend on timings of the parallel processes involved and are easily missed if repeating identical timed tests.
  • Have different authors for test code and feature code - As I first wrote in this post I think that having programmers writing the test fixture code for their own features exposes the risk of inappropriate assumptions being incorporated into those fixtures. A logical approach to avoid this risk would be to share this work out to another individual. In my organisation a subset of the testers write the code. This does not necessarily have to be the case. If this is not possible it makes sense to have the tester working with the programmer review the fixture design and ensure that solution based assumptions aren't being built into the test interface.

I appreciate that in my context the application of random inputs and parameterised runs is relatively simple , however I think that the principles can apply with any automation that drives an interface. Typically the effort involved in the addition of further run options to a developed test interface will be much lower than the initial creation of that interface. Even if this is not the case and it takes as long to support the additional modes as the first, the range of options in a test covering a multiple step workflow will grow exponentially with each option that is added, so the benefits should multiply accordingly. I appreciate the following diagram is highly simplistic but it does demonstrate how, with just one or two available variations in each step of a multi-step test workflow, the number combinations that we are able to exercise increases massively.

Test Combinations with increasing workflow steps

Testers are all too aware of the problems posed to testing by the addition of just a small number of options causing a multiplicative effect on the number of combinations in which software can be used. It makes sense to use that same phenomenon to our advantage where we can in increasing the range of options that we have in our tools. Even if we don't explicitly test all of these options up front, one of the areas where testers excel is in the recreation and identification of issues in the field. If we can take advantage of flexibility and extensibility in our tools to quickly recreate a problem scenario then this can lower reproduction and therefore turnaround time on fixes, as was the case with the Tomcat example that I mentioned above.

For me test automation is not about restricting the tester to a predefined set of steps that they can invoke through a simple language. It is about putting into the testers hands the power to drive the tested system in ways that the tester wants to and performing checks that they deem appropriate. By ensuring that we build automation tools to support a level of variety and control beyond the immediate requirements to achieve the workflow we can increase our power and scope of testing significantly. We can also dramatically increase the number of test combinations that we can cover with our tools relative to the the time, effort and money that we invest in creating them.

Tuesday, 24 September 2013

Blaming the Tester

 

It has been my unfortunate experience more than once to have to defend my testing approach to customers. On each occasion has been deemed necessary in the light of an issue that the customer has encountered in the production use of a piece of software that I have been responsible for testing. I'm open to admitting when an issue that should have been detected was not. What has been particularly frustrating for me in these situations is when the presence of the issue in question, or at least the risk of it, had already been detected and raised by the testing team...

The Dreaded Document

I have a document. I'm not proud of it. It describes details of the rigour of our testing approach in terms that our customers are comfortable with. It talks about the number of tests we have in our regression packs, how many test data sets we have and how often these are run. The reason that the document exists is that, on the rare occasion that a customer encounters a significant problem with our software, a stock reaction seems to be to question our testing. My team and I created this document in discussion with the product management team as a means to explain and justify our testing approach, should this situation arise.

The really interesting element in exchanges of this nature is that no customer has ever questioned any other aspects of our development approach, irrespective of how much impact they may have on the overall software quality.

  • They do not question our coding standards
  • They do not question our requirements gathering and validation techniques
  • They do not question our levels of accepted risk in the business
  • They do not question our list of known issues or backlog items
  • They do not question the understood testing limits of the system

Instead they question the testing

This response is not always limited to external customers. In previous roles I've even had people from other internal departments questioning how bugs had made their way into the released software. I've sat in review meetings listening to how individuals '... thought we had tested this software' and how they wanted to find out how an issue 'got through testing'. Luckily in my current company this has not been the case, however I have had to face similar questions from external customers, hence the document.

A sacrificial anode

The behaviour of the business when faced with questions from customers over testing has typically been to defend the testing approach. Whilst this is reassuring and a good vote of confidence in our work, it is also very interesting. It seems that there is a preference for maintaining the focus on testing, rather than admitting that there could be other areas of the business at fault. Product owners would apparently rather admit a failure in testing than establishing a root cause elsewhere. Whilst frustrating from a testing perspective, I think that on closer examination there are explainable, if not good, reasons for this reluctance.

    • The perception of testing in the industry - Whilst increasing numbers of testers are enjoying more integrated roles within development teams, the most commonly encountered perception of testing within large organisations is still of a separate testing function, which is seen as less critical to software development than product management, analysis and programming. As a consequence of this I believe that it is deemed more acceptable to admit a failure in testing than other functions of the development process which are seen as more fundamental. A reasonable conclusion then is that, if we don't want testers to receive blame for bugs in the products then we need to integrate more closely with the development process. See here for a series of posts I wrote last year on more reasons why this is a good idea.
    • Reluctance to admit own mistakes - Often the individuals asked to explain the presence of an issue that was identified by the testing were the ones responsible for making the decisions not to follow up on that issue. In defending their own position it is easy to use a mistake in testing as a 'sacrificial anode' in removing attention from risk decisions that they have made . This is not purely a selfish approach. Customer perception is likely to be more heavily impacted by an exposed problem in the decision making process than one in the testing, as a result of the phenomenon described in the previous point. It therefore makes sense to sacrifice some confidence in testing at the expense of admitting that a problem arose due to the conscious taking of a risk.
    • "Last one to see the victim" effect - A principle in murder investigation is that the last person to see the victim alive is the likeliest culprit. The same phenomenon applies to testing. We're typically the last people to work on the software prior to release, and therefore the first function to blame when things go wrong. This is understandable and something we're probably always going to have to live with, however again the more integrated the testing and programming functions are into a unified development team, the less likely we are to see testing as the ones who shut the door on the software on its way out.

Our own worst enemy

Given the number of testers that I interact with and follow through various channels, I get a very high level of exposure to any public problems with IT systems. It seems that testers love a good bug in other people's software. What I find rather disappointing is that, when testers choose to share news of a public IT failure, they will often bemoan the lack of appropriate testing that would have found the issue. I'm sure we all fall into this mindset, I know that I do. Whenever I perceive a problem with a software system, either first hand or via news reports, I convince myself that it would never have happened in a system that I tested. This is almost certainly not the case, and demonstrates a really unhealthy attitude. By adopting this stance all we are doing is reinforcing the idea that it is the responsibility of the tester to find all of the bugs in the software. How do we know that the organisation in question hasn't employed capable testers who have fully appraised the managers of the risks of such issues, and the decision has been made to ship anyway? Or that the testers recommended that the area was tested and a budgetary constraint prevented them from performing that testing. Or simply could it be that the problem in question was very hard to find and even excellent testing failed to uncover it? We are quick to contradict the managers who have unrealistic expectations of perfect software, claiming the infinite combinations of functions in even the most simple systems, yet we seem to have the lowest tolerance of failure in systems that are not our own.

Change begins at home and if we're to change the 'blame the tester' game then we need to start within our community. Next time you see news of a data loss or security breach, don't jump to blaming the thoroughness, or even absence, of testing by that organisation. Instead question the development process as a whole, including all relevant functions and decision making processes. Maybe if we start to do this then others will follow, and the first response to issues won't be to blame the tester.

 

image: http://www.flickr.com/photos/cyberslayer/2535502341

Monday, 22 July 2013

Celebrating good testing



A couple of weeks ago one of the testers in my team sent a mail around regarding a feature that he'd been looking at. The email related to the behaviour of that feature (a physical data purge for legal compliance) and the compression algorithms in the system. The content, presentation and plain existence of this mail struck me as demonstrating what software testing is, or should be, about. I thought this was worth celebrating and decided to share this great work.

I quickly realised that it was not apparent to me how to go about doing this. Unlike some other departments, I didn't have any standard channels to use to highlight and share excellent testing work. The problem got me thinking, and ultimately prompted this brief post: why do we find it so hard to celebrate good testing?

A shining example


As described in his email, the tester had examined a new feature of the software and the workings of it.
  • He'd established that the inconsistencies in support and state pre and post use of the feature posed a risk to the compression algorithms of the system - a great demonstration of insight and understanding the system.
  • He'd then drawn up a truth table of all possible compression options, both supported and prototype and tried each of these in turn - showing a methodical approach where it was required and also pragmatism in accepting that 'non-standard' options for internal use exist and ensuring these were factored in.
  • He had identified not only some functional issues but also some potential knock on considerations based on an understanding of how the reporting was being utilised by the business.
  • He'd not just raised a bug but had circulated a mail summarising the issue - using appropriate communication given the potential impact of his discoveries.
All in all a demonstration of some of the qualities that I think contribute to excellent testing.

Hiding our light


When our sales team records a success, they usually mail around the news to the whole company. This gives an opportunity for everyone to share in the success of the company, but also for the hard work of the individuals involved to be recognised. Whilst I'm certainly not averse to this sharing of success, I do think some areas of organisations and departments are better at it than others.
I think that testing teams and testers are particularly poor at celebrating their successes. I'm not suggesting that test teams start high fiving and ringing a bell in the office every time they find a bug, however I do wonder whether we undersell ourselves as compared to other roles and departments.

I believe that one of the main reasons for the lack of team self promotion in testing is that our successes can be interpreted as the failure of others. Shouting about all of the bugs we've found could be seen as highlighting the failure of BAs to cover everything in the specification, for example, or of the programmers to ensure appropriate robust logic. Do we therefore feel guilty about celebrating our successes?

This makes no sense. Testing is there for a reason. With a positive team attitude there is no reason why we can't celebrate great testing as much as coding new features or even making a sale. We can appreciate great goalkeeping such as Gordon Banks save against Pele in 1970 without feeling the need to criticise the defending players for forcing him to making the save. Like Pele, some bugs are very hard to stop, and require great skill to tackle. This skill should be recognised.

A Sign of Appreciation


In this case I decided not to mail the whole company. Instead I emailed the SVP in charge of Research and Development and informed him of the excellent testing that had been demonstrated, and he responded in total agreement. I was really pleased when the tester in question was singled out in last week's all company meeting and received a bottle of champagne to recognise his excellent individual contribution to the success of the company.

The event got me thinking about the issue of celebrating testing in a more general sense and ideas to address it. Whilst I don't expect to be able to initiate that level of response for every good piece of testing, there are other ways that I thought of to highlight good work. Other things that I considered were:
  • On a smaller scale day to day level, we can celebrate great testing through individual mentions in meetings such as team meetings or standups
  • We can highlight excellent testing in review activities, such as sprint end retrospectives. I certainly plan to raise the case above in our end of sprint retrospective meeting as an example of great work, the like of which should be encouraged.
  • Creating a periodical newsletter of testing improvements and achievements. This would take some discipline and skill to make it interesting to the rest of the company, however I think having confidence in the testing of the system is something that would be welcome in the wider business.

Open ended


I'm not great at celebrating my own successes, many testers I've met are similar. I find that conferences and meetups allow us to discuss our testing in a like minded appreciative community which makes it easier to celebrate good testing without the inhibitions of guilt. Within our own organisations, however, my experience is that testers find if much harder to publicise our achievements, and when we do the audience is not always enthusiastic. Even in an organisation with as positive an attitude to testing as mine suffers from classic misconceptions which can limit appreciation of our work (a senior exec used the term 'bottleneck' just recently in a conversation with me on testing improvements).

So this post comes to an end more as a question than anything else. How do we celebrate and publicise good testing, not just to those that want to hear such as you reading this post, but others with less of an interest within our own organisations ? I'd love to hear your thoughts...

Tuesday, 25 June 2013

An Extreme Reaction

 

Being one of the thousands of people who use the British public transport system to get to work I like to think I have a pretty high pain threshold when it comes to problems and disruptions to my daily routine. Late trains and disruptions are a common part of my commute, however a recent event elicited an emotional response from me far greater than the situation in isolation merited. The event, and my reaction, provided me with an interesting example of the mindset of technology users in the face of enforced changes, and how our subjective perception of importance can shift dramatically given a context in which user options have been restricted. I also gained some compelling evidence of the importance of considering your environmental variables when testing.

We hate change

In November last year ticket barriers were installed at each entrance to my local station which required a valid train ticket to enter the station. Typically prior to this I had purchased my tickets from a machine in the station, however if queues were bad or machines faulty I could also go to the manned kiosk or buy a ticket on the train, risking only a mildly grumpy train manager. After the barriers were installed my only option to access the station by the street entrance that I used was a machine which was placed outside the automatic barriers. Going to the manned kiosk was still possible but required a long walk around the road to the main entrance on the other side of the station, and buying on the train was no longer an option. 

My natural inclination was to be wary of the changes, however after a couple of weeks of painless commuting I settled into things and reluctantly accepted that the changes weren't as bad as I expected.

Environment Variables

One cold day soon after, running a little late, I was pleased to find the queue for the ticket machine empty. I selected my season ticket and tried to enter my code on the touch screen. I quickly realised that many of the characters, particularly those around the edge of the screen, were unresponsive. I was having to press really hard on the screen to get anything to register, and then it could be the character below or above the one I wanted requiring me to use the equally non-functional backspace and try again. Three characters in and I had already taken far longer than I should need and it slowly dawned on me that it was taking so long to enter my number that I was at serious risk of missing the train ...

A little perspective

Now is probably a good time to add a little perspective. The train I catch is at 8.15am, the next train I can catch is at 8.31am. If I miss my train I'll catch the next one and be in 16 minutes later to work, which has little impact on my day. Occasionally I am delayed at home and decide myself to take the later train to avoid rushing. Sometimes I take it to catch up with a friend who travels on the same one, the day after writing this I took the later train to wait and travel with a colleague who was running late. Taking the later train, whilst not normal, is a perfectly acceptable alternative for me.

An irrational panic

...back at the ticket machine the availability of a train 16 minutes later was the last thing on my mind. As I struggled to mash the digits of my code into the screen I started to enter a state of frustrated panic. This infernal machine was preventing me from catching my train. Prior to the barriers machine failure occurred quite often, and would have registered as nothing more than an inconvenience in my journey. The absence of any other options now meant that my only path to catching my train was this infernal machine. I started to shout at it,  I find this always helps with errant technology, particularly if your main goal is accumulating 'funny looks' from passers by. With only a couple of minutes to go I was on the last character, a P which was located right in the corner of the screen. It wouldn't work. No matter of pressing, even with the pressure of both hands, would get the letter P to register on the screen. I started to panic. I could hear the train pulling into the station. If a genie had popped up at that moment offering 3 wishes my first and only one would have been for that @¥&£#% P to work so I could get my ticket. Realising that the cause of the malfunction was probably the temperature I leaned over the screen and began to frantically breathe on it to try to get it to warm up ( I couldn't risk using my hands for fear of accidentally firing another button). To passers by my alternate breathing into the screen and pumping my fists on it must have looked like I was trying to bring it back to life through some kind of technological CPR. Just like in the movies, at the last moment my efforts finally paid off, the P registered and I got my ticket. Running through the barriers I shouted at the staff as I raced past "THAT MACHINE DOESN'T WORK WHEN IT GETS COLD". Racing recklessly down the platform steps I managed to get on the train with seconds to spare and collapse into my seat in a flurry of anger, and adrenaline fuelled elation.

Rationality in reflection

On first reflection my actions were totally irrational and quite foolish. The sensible course of action would have been to calmly take the few minute walk around to the manned desk, buy a ticket and take the later train. I probably lost more productive time in arriving to work flustered than I gained in getting that train. So why was I so determined to get that one on this occasion? I think that the reasons become more apparent if we examine the emotional context that was in place coming into this incident:-

  • An unwanted change had been imposed
  • I'd recently had some changes imposed on me which affected an established pattern of operation. Whilst as an individual I am quite receptive to changes and am often an instigator of change in my organisation, when a change impacts an established routine then it is rarely welcome. I'm not alone in this, many people have a natural tendency to favour the current situation and established behaviours over change. When the change was announced I had started to mentally accumulate reasons why it would not be welcome. Even though, for example, I hadn't done it for months I was still annoyed that the presence of the barriers would prevent me from taking my children onto the platform to see the trains. This is referred to as a "status quo bias" and is associated with a tendency to place a higher emphasis on the potential losses of a change than any corresponding potential benefits. As soon as it had occurred phrases such as 'I knew it' and 'typical' started echoing around my internal monologue. If I'm honest I was probably intentionally magnifying the severity of the problem to justify my previous objection to the situation. 

  • I was not the beneficiary of the change
  • All of the benefits of the installation of the barriers were going to the train companies. These things were not there to make my life easier. My emotional position on the change was that it was going to be for the worse and there were few obvious reasons for me to change this.

  • My options had been restricted
  • Whilst I occasionally had problems with the previous system I usually had the flexibility to work around them. As I mentioned in addition to the machine I used to have the options of buying at the kiosk or on the train. These were not readily available to me any more which placed a much greater emphasis on the machine working. The tendency to oppose any perceived restriction in one's personal autonomy is an extremely powerful bias known as "Reactance". Reactance applies when we feel that something is removing our personal freedom to choose.

  • The fault could have been detected
  • This was probably the final straw in my over-reaction. The fact is that this was a machine that had been designed to operate outside in Britain yet didn't work in freezing temperatures. For most people this would be annoying. For a software tester it was exasperating. My whole frustrating encounter was the result of a singular failure to consider the possible environmental variables when testing the machine.

So the event occurred in an emotional context of: a strong set of biases against the recent change; a removal of alternatives; a piece of technology which hadn't been tested for its target environment; oh, and a disgruntled software tester.  

A Troublesome Business

The situation above may seem far fetched, however this is exactly the kind of emotional context into which my software, and many other business technologies, are implemented. With regards to the product I work on, our customers are often looking to replace an established product or process. The primary value in these implementations is usually to achieve the equivalent functionality as an existing system at a much lower cost. In this context the people involved in implementing, administering and using the system are rarely the beneficiaries of the value in the change. They are also likely to have an emotional investment in the existing process given their experience the technologies and associated tools involved.

Business software implementations are often in the context of:-.   

  • replacing an existing process - thereby exposing the risk of status quo bias toward the existing system
  • being imposed on the users rather than being their choice - engendering reactance based emotions against the software before it is even switched on 
  • restricting the flexibility of operation - again giving rise to reactance based frustration at not having the flexibility previously enjoyed. Users can't, for example, stick a post it note on a computer form or write in the margin of an html page

The result is many low level stakeholders can harbour strong levels of bias against the implementation of the exact nature shown by my own experience. Whilst as a software vendor this is sometimes frustrating it is also totally understandable and something that software developers should anticipate. Business based software testers in particular need to consider the presence of negative bias in their implementations and examine the system accordingly. 

Occasionally I see comments from testers of business software almost jealously bemoaning the way that Facebook and Twitter thrive with little testing and a volume of functional flaws that would be unacceptable for business use. What we need to remember in business targeted software is that software used in a social context is rarely imposed and is usually done in the presence of a range of options (twitter, Google+, FaceBook, LinkedIn, Xing) out of which the user will have made a personal choice. This is much more likely to result in a positive emotional state when using product, and a more forgiving approach to faults.

To counter potential negative feelings towards our business software we need to focus on making sure the product and associated services are geared towards overcoming such a position. Users experiencing feelings of resentment towards our software are unlikely to enthusiastically digest every word of our documentation, so it is important that the software usability is considered. Is it consistent with the systems that we are replacing, or self-explanatory if not? Is it easy to identify and recover if mistakes are made? As well as the functionality at hand do the testers understand the greater goals of the customers. Documentation still needs to be accessible and searchable and written from the perspective of not only achieving key tasks but making necessary decisions, rather than a flat reference structure. Technical support need to be helpful and willing to step the users through their problems in the early stages to develop positive customer relationships and help to nurture a culture of positive and skilled use on the part of the customer.  

Of all the software implementations I have experienced, the ones I'm most proud of are when we've started out meeting some resistance to the product and turned this around into a situation where those users actively recommend it to others. Situations where initial resistance is overcome through our diligent work to understand the customer needs, test that the product allows them to meet these, and support them in doing so, creates some of the strongest advocates of our software and company.  The results are definitely worth it, as investment in startup companies can come on the back of references from existing customers. So the future of an organisation can depend heavily on the quality of product delivered in the present and the ability to generate raving fans. Failing to understand the target environment and failing to consider the emotional context that you are delivering into, and you may just end up with raving mad users.

Wednesday, 12 June 2013

Testing Big Data in an Agile Environment

 

Today my Testing Planet article on Testing Big Data in an Agile environment went online on the Ministry of Testing site. This is a timely post as I am in the process of preparing for a couple of talks on the subject of testing big data in the coming months.

  • I'll be running a session at the July UKTMF Quarterly Forum on 31st July discussing the practicalities of testing big data and the challenges that testers face.
  • In october I'm also presenting at Agile Testing Days a talk entitle Big Data Small Sprint on the challenges that we face trying to test a Big Data Product in short agile iterations

I'll try to post some more on the practicalities of testing a big data product as it is a hot topic in software at the moment. I've previously hosted a skype chat on the subject for testers working in big data environments to share their problems and would be happy to consider something similar again - please comment if you are interested. If you are along at either of the events above please come and say hello - especially if you are facing a big data challenge yourself. For now please take a look at the article, I'd be pleased to receive any feedback that you have on it.

Wednesday, 5 June 2013

Sticking to Your Principles

Running both the testing and support operations in my organisation affords me an excellent insight into the issues that are affecting our customers and how these relate to efforts and experiences of the testers who worked on the feature in question. I recently had reason to look back and examine the testing done on a specific feature as a result of the feature exhibiting an issue with a customer. What became apparent to me in looking back was that, during that feature development, I had failed to follow two important principles that my team have previously worked to maintain in our testing operations.

A dangerous temptation

When employing specialist testers within an agile process one of the primary challenges is to maintain discipline in testing features during the same sprint as the coding. Over the time that I've been involved in an agile process the maintenance of this discipline has occasionally proved difficult in the face of external pressures. I feel it has been key to our continued successful operation as a unified team.

During the development of a tricky area last year we found that the testing of a new query performance feature was proving very time consuming. The net result of this was that we didn't have as much testing time as originally thought to devote to another related story in the sprint backlog. Typically in this situation we would aim to defer the item until a subsequent sprint or to shift roles to bring someone else in as a tester on that piece. For various reasons in this case we decided to complete the programming and then test in the following sprint.

Some reading this will be surprised that an agile team would ever consider testing and developing in separate sprints. Believe me it is not the case for all teams who describe themselves as agile. A few years ago at the 2009 UKTMF Summit I attended a talk by Stuart Reid "Pragmatic Testing in Agile projects" in which Stuart suggested Testing in a subsequent sprint to coding was a common shape for agile developments. Many testers that I have spoken to in interview reinforce this notion, claiming to have worked in agile teams with scrums that were solely for the testing process with a single build delivery into the testing sprint. In fact one individual I have interviewed from a very large company described a 3 stage 'agile' process to me where coding, test script writing and test script execution were all done in seperate sprints.

The purpose of this post is not to criticise these teams, however I do personally believe that this is an approach that favours monitoring over responsibility at the expense of the true benefits that agile can deliver. In my experience the benefits of testing in the same sprint is that we can provide fast feedback on new features and provide a focus on quality during development. Without this benefit then developments can quickly exhibit the characteristic problems of testing as an isolated activity after coding. Even on an individual feature level, delaying feedback from testing until after the programming has 'finished' results in a significant change in the tester to developer dynamic. The problems reported by the tester are distracting from, rather than contributing to, the active projects for the programmer, something I explored more here. Some teams may achieve success through delayed testing in isolated sprints, for our team it marks a retrograde step from our usual standards.

Unrepresentative confidence

The second lapse in principles arose in the completion of the story.

When the testing commenced it was clear that the functionality in question was potentially impacted by pretty much every administration operation that could be performed on the data that was stored. The tester in question diligently worked exploring a complicated state model and exposed a relatively high number of issues compared to our other developments. A lot of coding effort was required in order to address these issues, however this was done under extra pressure of having estimated scant programming work for that story in the sprint it was being tested in the belief that it was essentially completed.

As I discussed in this post I use a confidence based approach on reporting story completion to allow for the many variables that can affect the delivery of even the simplest features. At the end of the story in question, under my guidance, the tester reported high confidence in all of the criteria on the basis that all of the bugs that they had found had been retested successfully. I did not question this at the time, however in hindsight I should have suggested a very different report on the basis of the nature and prevalence of the bugs that had been encountered. At the end of the sprint all of the bugs that had been found were fixed. Reporting high confidence on this basis belied the number of issues that had been discovered and the corresponding likelihood of there being more issues.

To hijack a famous testing analogy, if you are clearing a minefield and every new path exposes a mine, there is a good chance that there are still mines to be found in the paths you haven't tried yet.

This problem can arise equally through the arbitrary cut-off of the sprint timebox as a finite set of prescribed test cases. If after completing the prescribed period/test cases there are no outstanding issues it is hard to argue for further testing activity, however it is my firm belief that testing reporting should be sufficiently rich and flexible to convey such a situation. As I'd discussed with my team when introducing the idea, the reporting of confidence is intended to prompt a decision - namely whether we want to take any action to increase our confidence in this feature. The existence of a high number of issues found during testing is sufficient to diminish confidence in a feature and merit such a decision, despite those issues being closed. In this case we should have decided to perform further exploratory testing or possibly review the design. As it was the feature was accepted 'as was' and no further actions taken.

Problems exposed

We recently encountered a problem with a customer attempting to use the feature in question. Whilst the impact was not severe, we did have to provide a fix to a problem which was frustratingly similar to the types of issues found during testing.

I'm aware of the dangers of confirmation bias here, and the fact that we encountered an issue does not necessarily indicate that this would have been prevented had we acted differently. We have seen other issues from features developed much more in line with our regular process, however there are some factors which make me think we would have avoided or detected this one by sticking to the principles described.

  • The issue encountered was very similar in nature to the issues found during testing, it was essentially a hybrid of previous failures recreated through combining the recreation steps for known problems
  • The solution to the problem after a group review with the senior developers was to take a slightly different approach which utilised existing and commonly tested functions. This type of review and rework is just the sort of activity that we would expect do if testing was exposing a lot of issues while the coding focus was on that area and rework would have been considered more readily.

Slippery Slope

While being something of a 'warts and all' post I think this example highlights some of the dangers of letting standards lapse even briefly. It is naive to think that mistakes won't be made. With short development iterations there is scant time to resolve when this happens. For this reason I think that a key to maintaining a successful agile development process is to identify lapses and to increase effort to regaining standards quickly. As the name suggests a sprint is a fast paced development mechanism. In the same way that agile sprints can provide fast paced continuous improvement, degradations in the process can develop just as quickly. Any slip in standards can quickly become entrenched into a team if not addressed, and I've seen a few instances where good principles have been lost permanently through letting them slip for a couple of sprints.

Monday, 29 April 2013

The Testers, the Business and Risk Perception

 

One of the sessions I most enjoyed when I attended the TestBash on 22nd March was Tony Bruce's talk on testers and negativity. In his fantastic disarming style Tony discussed why testers are sometimes seen as negative by both themselves and the business, and whether that 'negativity' affects their lives outside work. Tony made a great point in that a tester role should be a positive one to provide information, why is that seen as negative?

A comment that I made in the ensuing discussion, that I think is worth expanding on, is how important the subject of perceived risk is to this scenario. I don't see testing as a negative role. Like Tony I see testing as an information provider to furnish the business with the information required to make important decisions. Those decisions will inevitably involve an element of risk adoption by the business and it is inevitable that each stakeholder in those decisions will have their own perception on the levels of risk involved. What I have seen is that in situations where testers are perceived as 'negative' could be more appropriately explained as a difference in the tester's perception of the risks involved the development and the level of risk adopted in the decisions taken by the business. If the tester disagrees with the business decision makers regarding the risks of problems such as software bugs, delays in the development process and eventually customer dissatisfaction then this can result in negativity both from the tester in the perception towards them.

I see many reasons why testers and product owners or business decision makers may not agree on the levels of risk being taken in a product development. When testing professionals encounter a situation where there is a large disparity between our own acceptable risk level and that of the business as a whole then our position can feel and appear to be very negative. If this occurs, rather than assuming that we're the only ones with full visibility of the situation and belligerently sticking to our negative stance, I think that instead we might want to ask ourselves some questions in an attempt to explain this disparity in perceived risk.

  • Am I overestimating the risks?
  • Is the business underestimating the risks
  • Ultimately - can I put up with it?

Am I overestimating the risks

In his book 'Risk, the Science and Politics of Fear', Dan Gardner gives an excellent explanation of how thousands of years of human evolution have formed our mental processes regarding risk.

'Our brains were simply not shaped by the world as we know it now, or even the agrarian one that preceded it. They are exclusively the creation of the Old Stone Age'

One consequence of this is how our brains make immediate risk assessments based on our ability to recall relevant experiences. This is a natural part of our 'System one' or 'gut' decision making process that has evolved to make quick decisions in risky situations. This is contrasted with the processes of logical reasoning which forms 'system two' or 'head' thinking which are slower but more accurate. We are pre-disposed to considering a situation to be risky if we are able to recall similar situations where problems have arisen. This recollection can be through our own experiences or through 'stories' conveyed to us by others, a concept known as the Availability Heuristic. As Gardner points out, whilst this mechanism worked well for our ancestors in avoiding danger, the mechanism is flawed when it comes to modern society. The huge amounts of information 'available' to us relative to a situation can provide an unrealistic perception of the actual risks present. To paraphrase Gardner for this context, we are running around testing software with brains that are perfectly evolved for avoiding dangerous animals while hunting and gathering.

Testers work revolves around finding problems with software. We find our own bugs, we read articles on testing and software problems and, when we meet other testers, we share stories of bugs and software problems. We will also typically spend more time investigating and examining problem behaviour than anyone else. An inevitable result is that the experiences driving our own availability heuristics' will be inclined to over-emphasise the likelihood of issues. The examples that are most readily available to us when facing new situations are likely to be based around problems that we have previously seen. We will have a natural tendancy to possess higher levels of perceived risk than other roles. The business, on the other hand, don't see all of the bugs. Their experiences of bugs are often masked behind figures, reports and metrics, which might convey summary information, however the personal experience of issues encountered is absent and so, therefore, is the 'System one' perception of risk. 

From a tester standpoint, a rather un-palatable conclusion that we could draw from this is that, in the situation where we have provided excellent status information to the business, they could well be better placed than us to accurately assess the risks involved in releasing the software. This is because they are more likely to be operating in a 'System two' process of logical reasoning based on the facts, whereas we will be strongly influenced by our 'System one' processes to assume problems on the basis of prior experience.

Testers role and experiences will also drive their assessment of risk to be heavily based around software bugs. The business decision makers, if they are doing their jobs effectively, will have visibility of other categories of risk which must be considered when making project and release decisions. Risks such as missing market opportunities, losing investor confidence and missing opportunity for useful feedback are all factors outside of the scope of the quality of the code which must be considered.

The somewhat enlightening conclusion for testers for me is this - we need to be able to let go of the worry. If you have done the best job that you can to provide the business with information, and they are making a decision based on that information, then you have done your job. Understand that your manifold experiences, both personal and second-hand, of failures in software causes your gut to see problems everywhere and you may not be best placed to accurately assess the overall risk involved in a software release. You may reject this idea, claiming that you are aware of biases yet not susceptible. As Gardner points out, this is a common problem

'Psychologists have found that people not only accept the idea that other people's thinking may be biased, they tend to overestimate the extent of that bias. But almost everyone resists the notion that their own thinking may also be biased.'

Sometimes as a tester you have to identify when you've been too close to too many problems to be thinking rationally, and work to providing the information to let others make the risk decisions. The folks making those decisions will hopefully have access to multiple information sources, in addition to the output of testing, which helps to balance biases in the decisions being made.

 

Is the business Underestimating the risks?

Of course there are two sides to every disagreement. Business decisions are made on the basis of perceived risk, which is based on the information available to the decision maker. It may be that the decision maker is actually adopting a riskier approach than they think due to poor, or poorly presented information, or their own biases. Underestimation of potential risks by the business will also result in a differential in perceived risk between business/management and testers. In their 1988 paper on "Underestimation Bias" in business decision making "An availability bias and professional judgement", Laurette Dubé-Rioux and J. Edward Russo suggest that underestimation as a bias is again heavily influenced by availability, or the lack thereof. 

"After evaluating such alternative explanations as category redefenition, we conclude that availability is a major cause, though possibly not the sole cause, of the underestimation bias"

In summary their findings were that decision makers tended to group risks for which they had low visibility into catch all categories and then underestimate the likelihood of anything in those categories occurring, with the lack of available examples of those risks proposed as the major cause of this underestimation,

If the perceived level of risk adopted by the business is based on low availability, and actually differs significantly from the real levels, then we may be able to help through providing better information to inform risk decisions. It is therefore our responsibility to convey as clearly as possible the relevant information to allow an informed decision. As the perception of risk is heavily influenced by our availability of relevant experiences or stories, it follows that in order to convey risk information then the most effective mechanism would therefore to be through the sharing of experience, rather than the presentation of raw figures. I've certainly encountered the situation when testing poor quality code where simply describing a few of the issues encountered can have a much greater impact on the recipient than the presentation of bug counts. Metrics and status reports can convey a certain level of information, however when backed up with examples of the nature of issues being encountered this creates a much more personal response and will have a much more significant impact on the perceived risk in the recipient. I've been in more than one situation where a "bug story" that I have conveyed to a manager has been subsequently repeated by them when reporting project status externally.

In short - if you want to influence perceived risk, then start telling stories.

Before doing so, however, consider whether it will benefit the business. As I've stated above it could be that our levels of perceived risk are unrepresentatlvely high compared to the actual risk, in which case telling stories of every bug found may simply result in the business moving closer to our 'System one' position away from a more realistic assessment of the situation.

Can I put up with it?

In my list of questions at the start if the post, you may wonder why I've included - 'can I put up with it?' , but specifically haven't included is 'how do I change things?'. The simple answer is that I believe that, whilst improving accuracy of perceived risk may be possible, changing the level of risk adopted by the business is unlikely to be something that a tester can achieve.

In a fascinating experiment, researchers into risk behaviour placed cameras at an open level crossing and then recorded the speeds of cars travelling through and the correlated risk of accident. The researchers then cut back the trees around the crossing to improve visibility and repeated the experiment. The results revealed a huge amount about human risk adoption in that, due to the reduced perceived risk, drivers increased their speed on average such that the same proportion of vehicles were at risk of an accident as before with no net safety benefit from improving visibility.

Behaviour was compared before and after sightline enhancement achieved by the removal of quadrant obstructions. At the passive crossing, sightline enhancement resulted in the earlier preview of approach quadrants. The perceived risk of approach to this crossing appeared to be reduced, resulting in consistently higher approach speeds after sightline enhancement. This performance benefit in response to the intrinsic ,effect upon safety realised by sightline enhancement yielded no net safety benefit

The implication of such results is that people will adopt a predefined level of risk to situations and will adjust their behaviour to reflect this based on new information, a phenomenon known as Risk Compensation. This led to interesting consequences for car manufacture, for example, where safety features don't result in improved safety, but instead result in increased net driving speeds and riskier driving behaviour.

This also has pretty fundamental implications to software projects. Essentially, if this phenomenon applies in business, then every action taken by a testing team to improve testing and the confidence in a product, will result in a change in behaviour by the business to operate faster or implement more features at the same risk level, rather than using the improvement to reduce risk.  For example, in the case of introducing test automation, if automated tests are seen as providing an equivalent level of confidence as humans executing the same 'cases' then the response by the business is likely to be to drive for faster development and lower levels of manual testing on the back of the perceived confidence boost. If the improvements by the tester were driven by a disparity between their own acceptable risk levels and those of the business, the outcome is likely to prove very frustrating for them.

Hence the question 'can I put up with it?' - if you as a tester are at odds with your company over what constitutes acceptable risk, get used to it, is unlikely to change and any improvements that you make to try to address it could actually make things worse.

Knowing oneself

The understanding of risk and personal bias is a complex subject. In the testing world we need to try to ensure that our risk perception is based as much as possible on "System two" thinking and not 'System one' feelings driven by the availability heuristic. Avoiding such biases is hard, however in any situation we should be asking ourselves if our position is based on evidence arising from testing performed, or whether we simply have a 'gut' feeling that there will be problems. In considering risks are we calling to mind particuarly memorable problems from other projects that could be affecting a realistic assessment?

This problem is not limited to software testing, a 2005 paper discussing the problem of decision making in the medical profession provided these key guidelines for avoiding the availability heuristic:-

  • Be aware of base rates (more appropriate for medical diagnosis, however the prevalence of issues arising from live use is an important yardstick)
  • Consider whether data are truly relevant rather than just salient 
  • Seek reasons why your decisions may be wrong and entertain alternative hypotheses
  • Ask questions that would disprove, rather than confirm, your current hypothesis
  • Remember you are wrong more often than you think

I think that these provide sound general advice. A feeling of constant negativity is not a healthy or sustainable situation for any role. Being aware of your biases and using these guidelines you may find that your negative position eases somewhat in the light of evidence. If you are doing your job and the business are happy then maybe you are overemphasising risks, and you need to lighten up and lose some of that negativity.

If, on the other hand, you are confident in your assessments, you are providing excellent information into your company's risk decisions, and you are still finding yourself in a very 'negative' position relative to the rest of the business, I suggest you think long and hard about whether you are in the right place. As we've seen, your business is unlikely to change.

References

Risk, the science and politics of fear: Dan Gardner
Risk perception by Lennart Sjõberg
DRIVER RESPONSE TO IMPROVED LATERAL VISIBILITY: EVIDENCE OF USABILITY OF "RISK HOMEOSTASIS THEORY".WARD, Nicholas J.; Husat Research Institute, Loughborough Univ. of Tech., Leicestershire, United Kingdom
Study: Airbags, antilock brakes not likely to reduce accidents, injuries - Emil Venere, Purdue University News
Five pitfalls in decisions about diagnosis and prescribing : Jill G Klein
Wikipedia - the Availability Heuristic
An availability Bias in Professional Judgement : Laurette Dubé-Rioux and J. Edward Russo Cornell University 1988

Photo: http://www.hotelsanmarcofiuggi.it/en/free_climbing.php

ShareThis

Recommended