Wednesday, 12 June 2013

Testing Big Data in an Agile Environment

 

Today my Testing Planet article on Testing Big Data in an Agile environment went online on the Ministry of Testing site. This is a timely post as I am in the process of preparing for a couple of talks on the subject of testing big data in the coming months.

  • I'll be running a session at the July UKTMF Quarterly Forum on 31st July discussing the practicalities of testing big data and the challenges that testers face.
  • In october I'm also presenting at Agile Testing Days a talk entitle Big Data Small Sprint on the challenges that we face trying to test a Big Data Product in short agile iterations

I'll try to post some more on the practicalities of testing a big data product as it is a hot topic in software at the moment. I've previously hosted a skype chat on the subject for testers working in big data environments to share their problems and would be happy to consider something similar again - please comment if you are interested. If you are along at either of the events above please come and say hello - especially if you are facing a big data challenge yourself. For now please take a look at the article, I'd be pleased to receive any feedback that you have on it.

Wednesday, 5 June 2013

Sticking to Your Principles

Running both the testing and support operations in my organisation affords me an excellent insight into the issues that are affecting our customers and how these relate to efforts and experiences of the testers who worked on the feature in question. I recently had reason to look back and examine the testing done on a specific feature as a result of the feature exhibiting an issue with a customer. What became apparent to me in looking back was that, during that feature development, I had failed to follow two important principles that my team have previously worked to maintain in our testing operations.

A dangerous temptation

When employing specialist testers within an agile process one of the primary challenges is to maintain discipline in testing features during the same sprint as the coding. Over the time that I've been involved in an agile process the maintenance of this discipline has occasionally proved difficult in the face of external pressures. I feel it has been key to our continued successful operation as a unified team.

During the development of a tricky area last year we found that the testing of a new query performance feature was proving very time consuming. The net result of this was that we didn't have as much testing time as originally thought to devote to another related story in the sprint backlog. Typically in this situation we would aim to defer the item until a subsequent sprint or to shift roles to bring someone else in as a tester on that piece. For various reasons in this case we decided to complete the programming and then test in the following sprint.

Some reading this will be surprised that an agile team would ever consider testing and developing in separate sprints. Believe me it is not the case for all teams who describe themselves as agile. A few years ago at the 2009 UKTMF Summit I attended a talk by Stuart Reid "Pragmatic Testing in Agile projects" in which Stuart suggested Testing in a subsequent sprint to coding was a common shape for agile developments. Many testers that I have spoken to in interview reinforce this notion, claiming to have worked in agile teams with scrums that were solely for the testing process with a single build delivery into the testing sprint. In fact one individual I have interviewed from a very large company described a 3 stage 'agile' process to me where coding, test script writing and test script execution were all done in seperate sprints.

The purpose of this post is not to criticise these teams, however I do personally believe that this is an approach that favours monitoring over responsibility at the expense of the true benefits that agile can deliver. In my experience the benefits of testing in the same sprint is that we can provide fast feedback on new features and provide a focus on quality during development. Without this benefit then developments can quickly exhibit the characteristic problems of testing as an isolated activity after coding. Even on an individual feature level, delaying feedback from testing until after the programming has 'finished' results in a significant change in the tester to developer dynamic. The problems reported by the tester are distracting from, rather than contributing to, the active projects for the programmer, something I explored more here. Some teams may achieve success through delayed testing in isolated sprints, for our team it marks a retrograde step from our usual standards.

Unrepresentative confidence

The second lapse in principles arose in the completion of the story.

When the testing commenced it was clear that the functionality in question was potentially impacted by pretty much every administration operation that could be performed on the data that was stored. The tester in question diligently worked exploring a complicated state model and exposed a relatively high number of issues compared to our other developments. A lot of coding effort was required in order to address these issues, however this was done under extra pressure of having estimated scant programming work for that story in the sprint it was being tested in the belief that it was essentially completed.

As I discussed in this post I use a confidence based approach on reporting story completion to allow for the many variables that can affect the delivery of even the simplest features. At the end of the story in question, under my guidance, the tester reported high confidence in all of the criteria on the basis that all of the bugs that they had found had been retested successfully. I did not question this at the time, however in hindsight I should have suggested a very different report on the basis of the nature and prevalence of the bugs that had been encountered. At the end of the sprint all of the bugs that had been found were fixed. Reporting high confidence on this basis belied the number of issues that had been discovered and the corresponding likelihood of there being more issues.

To hijack a famous testing analogy, if you are clearing a minefield and every new path exposes a mine, there is a good chance that there are still mines to be found in the paths you haven't tried yet.

This problem can arise equally through the arbitrary cut-off of the sprint timebox as a finite set of prescribed test cases. If after completing the prescribed period/test cases there are no outstanding issues it is hard to argue for further testing activity, however it is my firm belief that testing reporting should be sufficiently rich and flexible to convey such a situation. As I'd discussed with my team when introducing the idea, the reporting of confidence is intended to prompt a decision - namely whether we want to take any action to increase our confidence in this feature. The existence of a high number of issues found during testing is sufficient to diminish confidence in a feature and merit such a decision, despite those issues being closed. In this case we should have decided to perform further exploratory testing or possibly review the design. As it was the feature was accepted 'as was' and no further actions taken.

Problems exposed

We recently encountered a problem with a customer attempting to use the feature in question. Whilst the impact was not severe, we did have to provide a fix to a problem which was frustratingly similar to the types of issues found during testing.

I'm aware of the dangers of confirmation bias here, and the fact that we encountered an issue does not necessarily indicate that this would have been prevented had we acted differently. We have seen other issues from features developed much more in line with our regular process, however there are some factors which make me think we would have avoided or detected this one by sticking to the principles described.

  • The issue encountered was very similar in nature to the issues found during testing, it was essentially a hybrid of previous failures recreated through combining the recreation steps for known problems
  • The solution to the problem after a group review with the senior developers was to take a slightly different approach which utilised existing and commonly tested functions. This type of review and rework is just the sort of activity that we would expect do if testing was exposing a lot of issues while the coding focus was on that area and rework would have been considered more readily.

Slippery Slope

While being something of a 'warts and all' post I think this example highlights some of the dangers of letting standards lapse even briefly. It is naive to think that mistakes won't be made. With short development iterations there is scant time to resolve when this happens. For this reason I think that a key to maintaining a successful agile development process is to identify lapses and to increase effort to regaining standards quickly. As the name suggests a sprint is a fast paced development mechanism. In the same way that agile sprints can provide fast paced continuous improvement, degradations in the process can develop just as quickly. Any slip in standards can quickly become entrenched into a team if not addressed, and I've seen a few instances where good principles have been lost permanently through letting them slip for a couple of sprints.

Monday, 29 April 2013

The Testers, the Business and Risk Perception

 

One of the sessions I most enjoyed when I attended the TestBash on 22nd March was Tony Bruce's talk on testers and negativity. In his fantastic disarming style Tony discussed why testers are sometimes seen as negative by both themselves and the business, and whether that 'negativity' affects their lives outside work. Tony made a great point in that a tester role should be a positive one to provide information, why is that seen as negative?

A comment that I made in the ensuing discussion, that I think is worth expanding on, is how important the subject of perceived risk is to this scenario. I don't see testing as a negative role. Like Tony I see testing as an information provider to furnish the business with the information required to make important decisions. Those decisions will inevitably involve an element of risk adoption by the business and it is inevitable that each stakeholder in those decisions will have their own perception on the levels of risk involved. What I have seen is that in situations where testers are perceived as 'negative' could be more appropriately explained as a difference in the testers perception of the risks involved the development and the level of risk adopted in the decisions taken by the business. If the tester disagrees with the business decision makers regarding the risks of problems such as software bugs, delays in the development process and eventually customer dissatisfaction then this can result in negativity both from the tester in the perception towards them.

I see many reasons why testers and product owners or business decision makers may not agree on the levels of risk being taken in a product development. When testing professionals encounter a situation where there is a large disparity between our own acceptable risk level and that of the business as a whole then our position can feel and appear to be very negative. If this occurs, rather than assuming that we're the only ones with full visibility of the situation and belligerently sticking to our negative stance, I think that instead we might want to ask ourselves some questions in an attempt to explain this disparity in perceived risk.

  • Am I overestimating the risks?
  • Is the business underestimating the risks
  • Ultimately - can I put up with it?

Am I overestimating the risks

In his book 'Risk, the Science and Politics of Fear', Dan Gardner gives an excellent explanation of how thousands of years of human evolution have formed our mental processes regarding risk.

'Our brains were simply not shaped by the world as we know it now, or even the agrarian one that preceded it. They are exclusively the creation of the Old Stone Age'

One consequence of this is how our brains make immediate risk assessments based on our ability to recall relevant experiences. This is a natural part of our 'System one' or 'gut' decision making process that has evolved to make quick decisions in risky situations. This is contrasted with the processes of logical reasoning which forms 'system two' or 'head' thinking which are slower but more accurate. We are pre-disposed to considering a situation to be risky if we are able to recall similar situations where problems have arisen. This recollection can be through our own experiences or through 'stories' conveyed to us by others, a concept known as the Availability Heuristic. As Gardner points out, whilst this mechanism worked well for our ancestors in avoiding danger, the mechanism is flawed when it comes to modern society. The huge amounts of information 'available' to us relative to a situation can provide an unrealistic perception of the actual risks present. To paraphrase Gardner for this context, we are running around testing software with brains that are perfectly evolved for avoiding dangerous animals while hunting and gathering.

Testers work revolves around finding problems with software. We find our own bugs, we read articles on testing and software problems and, when we meet other testers, we share stories of bugs and software problems. We will also typically spend more time investigating and examining problem behaviour than anyone else. An inevitable result is that the experiences driving our own availability heuristics' will be inclined to over-emphasise the likelihood of issues. The examples that are most readily available to us when facing new situations are likely to be based around problems that we have previously seen. We will have a natural tendancy to possess higher levels of perceived risk than other roles. The business, on the other hand, don't see all of the bugs. Their experiences of bugs are often masked behind figures, reports and metrics, which might convey summary information, however the personal experience of issues encountered is absent and so, therefore, is the 'System one' perception of risk. 

From a tester standpoint, a rather un-palatable conclusion that we could draw from this is that, in the situation where we have provided excellent status information to the business, they could well be better placed than us to accurately assess the risks involved in releasing the software. This is because they are more likely to be operating in a 'System two' process of logical reasoning based on the facts, whereas we will be strongly influenced by our 'System one' processes to assume problems on the basis of prior experience.

Testers role and experiences will also drive their assessment of risk to be heavily based around software bugs. The business decision makers, if they are doing their jobs effectively, will have visibility of other categories of risk which must be considered when making project and release decisions. Risks such as missing market opportunities, losing investor confidence and missing opportunity for useful feedback are all factors outside of the scope of the quality of the code which must be considered.

The somewhat enlightening conclusion for testers for me is this - we need to be able to let go of the worry. If you have done the best job that you can to provide the business with information, and they are making a decision based on that information, then you have done your job. Understand that your manifold experiences, both personal and second-hand, of failures in software causes your gut to see problems everywhere and you may not be best placed to accurately assess the overall risk involved in a software release. You may reject this idea, claiming that you are aware of biases yet not susceptible. As Gardner points out, this is a common problem

'Psychologists have found that people not only accept the idea that other people's thinking may be biased, they tend to overestimate the extent of that bias. But almost everyone resists the notion that their own thinking may also be biased.'

Sometimes as a tester you have to identify when you've been too close to too many problems to be thinking rationally, and work to providing the information to let others make the risk decisions. The folks making those decisions will hopefully have access to multiple information sources, in addition to the output of testing, which helps to balance biases in the decisions being made.

 

Is the business Underestimating the risks?

Of course there are two sides to every disagreement. Business decisions are made on the basis of perceived risk, which is based on the information available to the decision maker. It may be that the decision maker is actually adopting a riskier approach than they think due to poor, or poorly presented information, or their own biases. Underestimation of potential risks by the business will also result in a differential in perceived risk between business/management and testers. In their 1988 paper on "Underestimation Bias" in business decision making "An availability bias and professional judgement", Laurette Dubé-Rioux and J. Edward Russo suggest that underestimation as a bias is again heavily influenced by availability, or the lack thereof. 

"After evaluating such alternative explanations as category redefenition, we conclude that availability is a major cause, though possibly not the sole cause, of the underestimation bias"

In summary their findings were that decision makers tended to group risks for which they had low visibility into catch all categories and then underestimate the likelihood of anything in those categories occurring, with the lack of available examples of those risks proposed as the major cause of this underestimation,

If the perceived level of risk adopted by the business is based on low availability, and actually differs significantly from the real levels, then we may be able to help through providing better information to inform risk decisions. It is therefore our responsibility to convey as clearly as possible the relevant information to allow an informed decision. As the perception of risk is heavily influenced by our availability of relevant experiences or stories, it follows that in order to convey risk information then the most effective mechanism would therefore to be through the sharing of experience, rather than the presentation of raw figures. I've certainly encountered the situation when testing poor quality code where simply describing a few of the issues encountered can have a much greater impact on the recipient than the presentation of bug counts. Metrics and status reports can convey a certain level of information, however when backed up with examples of the nature of issues being encountered this creates a much more personal response and will have a much more significant impact on the perceived risk in the recipient. I've been in more than one situation where a "bug story" that I have conveyed to a manager has been subsequently repeated by them when reporting project status externally.

In short - if you want to influence perceived risk, then start telling stories.

Before doing so, however, consider whether it will benefit the business. As I've stated above it could be that our levels of perceived risk are unrepresentatlvely high compared to the actual risk, in which case telling stories of every bug found may simply result in the business moving closer to our 'System one' position away from a more realistic assessment of the situation.

Can I put up with it?

In my list of questions at the start if the post, you may wonder why I've included - 'can I put up with it?' , but specifically haven't included is 'how do I change things?'. The simple answer is that I believe that, whilst improving accuracy of perceived risk may be possible, changing the level of risk adopted by the business is unlikely to be something that a tester can achieve.

In a fascinating experiment, researchers into risk behaviour placed cameras at an open level crossing and then recorded the speeds of cars travelling through and the correlated risk of accident. The researchers then cut back the trees around the crossing to improve visibility and repeated the experiment. The results revealed a huge amount about human risk adoption in that, due to the reduced perceived risk, drivers increased their speed on average such that the same proportion of vehicles were at risk of an accident as before with no net safety benefit from improving visibility.

Behaviour was compared before and after sightline enhancement achieved by the removal of quadrant obstructions. At the passive crossing, sightline enhancement resulted in the earlier preview of approach quadrants. The perceived risk of approach to this crossing appeared to be reduced, resulting in consistently higher approach speeds after sightline enhancement. This performance benefit in response to the intrinsic ,effect upon safety realised by sightline enhancement yielded no net safety benefit

The implication of such results is that people will adopt a predefined level of risk to situations and will adjust their behaviour to reflect this based on new information, a phenomenon known as Risk Compensation. This led to interesting consequences for car manufacture, for example, where safety features don't result in improved safety, but instead result in increased net driving speeds and riskier driving behaviour.

This also has pretty fundamental implications to software projects. Essentially, if this phenomenon applies in business, then every action taken by a testing team to improve testing and the confidence in a product, will result in a change in behaviour by the business to operate faster or implement more features at the same risk level, rather than using the improvement to reduce risk.  For example, in the case of introducing test automation, if automated tests are seen as providing an equivalent level of confidence as humans executing the same 'cases' then the response by the business is likely to be to drive for faster development and lower levels of manual testing on the back of the perceived confidence boost. If the improvements by the tester were driven by a disparity between their own acceptable risk levels and those of the business, the outcome is likely to prove very frustrating for them.

Hence the question 'can I put up with it?' - if you as a tester are at odds with your company over what constitutes acceptable risk, get used to it, is unlikely to change and any improvements that you make to try to address it could actually make things worse.

Knowing oneself

The understanding of risk and personal bias is a complex subject. In the testing world we need to try to ensure th at our risk perception is based as much as possible on "System two" thinking and not 'System one' feelings driven by the availability heuristic. Avoiding such biases is hard, however in any situation we should be asking ourselves if our position is based on evidence arising from testing performed, or whether we simply have a 'gut' feeling that there will be problems. In considering risks are we calling to mind particuarly memorable problems from other projects that could be affecting a realistic assessment?

This problem is not limited to software testing, a 2005 paper discussing the problem of decision making in the medical profession provided these key guidelines for avoiding the availability heuristic:-

  • Be aware of base rates (more appropriate for medical diagnosis, however the prevalence of issues arising from live use is an important yardstick)
  • Consider whether data are truly relevant rather than just salient 
  • Seek reasons why your decisions may be wrong and entertain alternative hypotheses
  • Ask questions that would disprove, rather than confirm, your current hypothesis
  • Remember you are wrong more often than you think

I think that these provide sound general advice. A feeling of constant negativity is not a healthy or sustainable situation for any role. Being aware of your biases and using these guidelines you may find that your negative position eases somewhat in the light of evidence. If you are doing your job and the business are happy then maybe you are overemphasising risks, and you need to lighten up and lose some of that negativity.

If, on the other hand, you are confident in your assessments, you are providing excellent information into your company's risk decisions, and you are still finding yourself in a very 'negative' position relative to the rest of the business, I suggest you think long and hard about whether you are in the right place. As we've seen, your business is unlikely to change.

References

Risk, the science and politics of fear: Dan Gardner
Risk perception by Lennart Sjõberg
DRIVER RESPONSE TO IMPROVED LATERAL VISIBILITY: EVIDENCE OF USABILITY OF "RISK HOMEOSTASIS THEORY".WARD, Nicholas J.; Husat Research Institute, Loughborough Univ. of Tech., Leicestershire, United Kingdom
Study: Airbags, antilock brakes not likely to reduce accidents, injuries - Emil Venere, Purdue University News
Five pitfalls in decisions about diagnosis and prescribing : Jill G Klein
Wikipedia - the Availability Heuristic
An availability Bias in Professional Judgement : Laurette Dubé-Rioux and J. Edward Russo Cornell University 1988

Photo: http://www.hotelsanmarcofiuggi.it/en/free_climbing.php

ShareThis