Monday 21 October 2013

Variety is the Spice of Test Automation

When I shared our 'Set of Principles for Automated Testing' a few years ago, one of the key principles included was the separation of tests from test code. This is a principle which is used widely in test tooling and there are a number of test automation/living documentation approaches for which this notion of separation is an integral part of their operation

  • Keyword driven testing structures drive a test harness through parameter files which contain a bespoke set of commands to apply certain actions to the system under test.
  • Fitnesse drives automation 'fixtures' through a wiki documentation structure, often with parameter/result combinations written in tabular for containing definitions for individual checks
  • Atdd tools such as cucumber take this a step further and interpret natural language based commands into executable steps, again through test fixtures and centralised code

These approaches are designed, amongst other things, to simplify test creation and maintenance through avoiding the writing of new test code for each test. One could argue that any syntaxes that we use to define the tests, whether keywords or natural language, are a form of interpreted code. I'm inclined to agree but for the purposes of clarity I'll only refer to the language that is used to write the test harness or fixtures interfacing with the system under test as 'test code' in this post. By centralising test code we adhere to the principles of code reuse and DRY (don't repeat yourself) that apply in most programming endeavours. I can see the efficiencies that may be achieved by doing this, however I think that there are also inherent pitfalls that I think need to be considered if we're not to seriously restrict the flexibility in our automation.

  • Limiting to the solution design -  To achieve even greater efficiency, some folks recommend that the programmers writing the software also create the test fixtures and harnesses, leaving the testers to focus on test creation and definition. When a programming team designs a feature they will be working with a model of the solution in mind. This model will constrain the range of inputs, operations and outputs considered valid for the features of the designed solution. When designing fixtures to drive those features the natural bias of the developer will be to limit the scope of the fixtures to supporting what they perceive to be the scope of that model. I believe that the point of test code should be to allow testers to drive the system under test in a variety of ways. In order to do this effectively we should be free to operate in the problem domain, and explicitly aiming to discover areas where the problem and solution domains are inconsistent.
  • Limited Control - If, for example, the phrase 'when a user creates an account' triggers a test harness to execute a predictable process every time that conforms to the 'ideal' use of that feature, then the resulting tests are unlikely to provide a range of scenarios that is reflective of real use. The danger is that, by abstracting the testers from the code that interfaces with the product, through a natural language interface for example, we limit our flexibility in exercising the application in a range of ways that represents a more varied and realistic use. My preference is for the tools that I use to extend the reach of the tester to activities which would not otherwise be available to them. This will include predictable and stable workflows for regression testing but should also allow access to scale, volume, parallelisation and frequency of activity that would otherwise be unavailable without those tools.
  • Lack of Variety - With lack of flexibility there is also an implied lack of variety and randomness in the actions that the product is subjected to. Whilst known state and a measurable, checkable outcome are required for functional test automation, this needs to be balanced with the ability to add variety and randomness that increase the range of combinations tested and thereby the chances of exposing issues.

Providing Flexibility

So how to balance the need for predictable test output for checking with the need for supporting variety and scope in test harness capabilities? Here are a few approaches that I've found can be very effective:-

  • Use parameters to add flexibility - We have a very useful parallel threaded query test harness that we developed in house. In addition to the ability to select the queries to run at random, it is also possible to select one of a number of connection modes via an input parameter. These modes change the manner by which queries, statements and connections are used by the harness. This is achieved through use of a set of connection classes which expose a common object interface to the driving harness. In this way we can adhere to the DRY principles of reusing both the main harness code, and the test definition files, yet still provide flexibility in the interface between harness and application. The structure is extensible too, such that when a customer was using a Tomcat connection factory to connect we were able to add this in as a connection class and recreate issues in that connection structure without altering our existing harness or having to develop new test data.
    Variation through use of interfaces in a test harness

    Parameters to change test execution don't just apply in the initial connection. Building support for test parameters which change key execution paths of the driving harness can be applied throughout the software lifecycle to great effect. For example the testers working on our query system can control the optimisations and runtime options applied to all of the queries in a run of tests by the application of the appropriate parameters. This allows the execution of a suite of tests with and without specific optimisations applied to compare the behaviour.

  • Allow for randomisation - Whilst a test in itself must have predictable output in order to apply a binary check, there is no reason why that test must be executed in a consistent pattern in relation to the other tests. Executing tests in parallel, with a level of randomisation in the execution order, provides a much broader range of execution scenarios than the execution of the same test in isolation each time. The regression harness that I currently develop supports the ability both to schedule multiple concurrent packs of tests and also to randomise execution order and timing within those packs. This helps to increase our chances of detecting concurrency issues between processes which can heavily depend on timings of the parallel processes involved and are easily missed if repeating identical timed tests.
  • Have different authors for test code and feature code - As I first wrote in this post I think that having programmers writing the test fixture code for their own features exposes the risk of inappropriate assumptions being incorporated into those fixtures. A logical approach to avoid this risk would be to share this work out to another individual. In my organisation a subset of the testers write the code. This does not necessarily have to be the case. If this is not possible it makes sense to have the tester working with the programmer review the fixture design and ensure that solution based assumptions aren't being built into the test interface.

I appreciate that in my context the application of random inputs and parameterised runs is relatively simple , however I think that the principles can apply with any automation that drives an interface. Typically the effort involved in the addition of further run options to a developed test interface will be much lower than the initial creation of that interface. Even if this is not the case and it takes as long to support the additional modes as the first, the range of options in a test covering a multiple step workflow will grow exponentially with each option that is added, so the benefits should multiply accordingly. I appreciate the following diagram is highly simplistic but it does demonstrate how, with just one or two available variations in each step of a multi-step test workflow, the number combinations that we are able to exercise increases massively.

Test Combinations with increasing workflow steps

Testers are all too aware of the problems posed to testing by the addition of just a small number of options causing a multiplicative effect on the number of combinations in which software can be used. It makes sense to use that same phenomenon to our advantage where we can in increasing the range of options that we have in our tools. Even if we don't explicitly test all of these options up front, one of the areas where testers excel is in the recreation and identification of issues in the field. If we can take advantage of flexibility and extensibility in our tools to quickly recreate a problem scenario then this can lower reproduction and therefore turnaround time on fixes, as was the case with the Tomcat example that I mentioned above.

For me test automation is not about restricting the tester to a predefined set of steps that they can invoke through a simple language. It is about putting into the testers hands the power to drive the tested system in ways that the tester wants to and performing checks that they deem appropriate. By ensuring that we build automation tools to support a level of variety and control beyond the immediate requirements to achieve the workflow we can increase our power and scope of testing significantly. We can also dramatically increase the number of test combinations that we can cover with our tools relative to the the time, effort and money that we invest in creating them.

James Thomas said...

Great post Adam.

Another dimension here can be general vs specific tests - the former being much more straightforward to run (for example having no data dependencies) and so easier and more flexible to apply to arbitrary installations.

I wrote about ( a suite we've developed for testing an API which breaks tests down this way, parameterises (as you say) and partitions them so that a single suite can be run a variety of ways in a variety of environments without change. It randomly walks through whatever data happens to be on the installation under test and we find a lot of issues due to bits of data created by other tests, developers and so on discovered this way.

When I wrote this suite, I also collaborated with the developer to generate test data (request-response streams) that we share across unit tests and my higher-level tests. We check them out from the same location and share the maintenance of them.

Adam Knight said...

James, That is a nice post and it is great to have a real world example from a web domain of exactly the kind of parameterisation that I am talking about - I love the way you consider and parameterise the way that the inputs can be generated "fast" vs "slow", "keyboard" vs "mouse".

With regard to the 'general' testing I do a similar thing with relative results, whereby the output of a previous test creates the results on which a subsequent test can be checked. We have to be careful here to ensure the same error can't affect both results, but it is a powerful technique. I find it particularly useful in our Big Data context for scaling up tests through iteration - we can iterate the same test pack multiple times - the records in the archive being tested will change with each run but the relative tests allow us to perform consistent checking whilst gathering valuable timing and resource usage information as we scale the size of the archives.

Thanks for reading and commenting


Whatsapp Button works on Mobile Device only

Start typing and press Enter to search