Tuesday, 14 December 2010

Rolling with the Punches

On working with your software, not against it.

In recent weeks the focus of much of my testing work has been around the scalability of our system to larger and larger data archives. This is one of the greatest challenges that we face at RainStor. Given that our customers may be importing many billions of records every day and storing these for months or years we neither have the resources nor the time to perform a real time test of this kind of scenario in an iterative test environment.

We already have a number of techniques for scaling up test archives, however in the past all of these have still required the 'building' of data partitions, a process which requires importing, validating, structuring and compressing the data. We've got this process pretty quick but it still takes time.

At the start of the latest iteration I discussed the issue with our developers. I'm very lucky in that the developers in my organisation hold testing in very high regard and, for the most part, understand the types of problems that we face and help whenever they can. When I discussed the issue with them they identified a number of 'shortcuts' that they could implement into the import process which could help in a test capacity to bulk up data in the archive. These are showing some great benefits, but I still felt that we were going toe to toe with an issue which was bigger than we were, the system simply couldn't build data as quickly as I wanted it to.

Redefining the problem

In reality, I didn't need to build the data, I just wanted to add many data partitions into the archive. Simply copying them in was an option, however this would not give a realistic scale up of the process of committing the data including logs and auditing that I wanted to be present. On examination of the other interfaces that data could be validly imported into the system I realised that we could potentially utilise an exisiting function for export/import of a data partition from one archive to another. The reason we'd not used this before was that a partition could only be copied into an archive once. This was a limitation that could be worked around through the creation of a test harness process to 'rename' a replication file and resubmit. In this way I've been able to create a bulk import process that uses a valid system API to scale up data partitions in a fraction of the time taken to import. This weekend I scaled up 3 months worth of data imports into an archive in 2 days.

What is the point

So what am I trying to say? Simply this, sometimes going toe-to-toe with a big limitation on testability in your system will cost you dear and get you nowhere. Trying to populate large amounts of data through your standard inputs and teh associated business rules can be like banging your head against a wall. Rather than looking at what the system cannot do, look at what it can do and work with that. If the standard data input processes are proving to be limiting factors on your setup or scalability, look at any Backup/Restore, replication, export and import functions that are supported and see if these can be harnessed to meet your needs. These functions are based around system generated inputs and often bypass business rules and validation, making them a much more efficient mechanism for populating test data than working through the standard input mechanisms, but without the risk of writing the data directly into the system/database. If you are worried about whether these interfaces will provide you with a realistically populated database/system, then maybe it is time to test these a bit more before the customers need to use them.

Copyright (c) Adam Knight 2009-2010

Friday, 19 November 2010

Dont call me Technical

I spent a very pleasant day today at the Agile Testing and BDD Exchange at SkillsMatter in London today. In general the day was a good one, slightly more focus on tools this year, I have to admit to preferring the more structure and approach based talks that were involved last year.

One of the subjects that caused some twitter discussion after the event came about through one of the presenters questioning whether testers were 'technical' enough to be comfortable with working with the programming syntaxes being presented.

To highlight the issue I'll rewind to an earlier talk in the day when Dave Evans of SQS and Erik Stenman presented an experience report on agile testing at Klarna. Erik discussed the fact that Klarna's online retail transaction processing software was written in Erlang. Erik and asked the audience how many were programmers, and how many of those were familiar with Erlang. There was no sense of condescension, it was simply a show of hands of those familiar with that language.

Compare this to the later talk in which a similar question was asked of testers, yet it was framed not in the context of familiarity with the programming language in question, but in more general terms of how 'technical' the testing contingent were (i'm not sure if this was the exact term used but it was the implication, and was the term carried into the subsequent twitter discussions)

As Mike Scott of SQS put it:-

Why do people assume testers are not technical. Lets stop this now. Please don't patronise us.

Mike makes a valid point, but still (probably for brevity in a tweet) uses the 'technical' categorisation. Lanette Creamer provided an excellent response:-

I agree. Also, what is "technical"? It means different things to different people.

This couldn't be more true. The chap sat next to me was Jim, a tester from my team at RainStor. He did not put his hand up. Now I've seen this guy read and understand SQL queries longer than some novels and find faults with them through visual static analysis. Of course he is a technical tester. In fact the 'developers' in our team, all competent C/C++ programmers, treat Jim's SQL knowledge with something approaching reverence. He is an invaluable member of our team as his "technical" database skills are fundamental to the database domain in which we operate. His lack of familiarity with object oriented programming language syntaxes, however, was sufficient for him to not show his hand to be counted as one of the 'technical' testers in the room.

Given the accepted wisdom of having a multi-skilled team, isn't it about time we also accepted the value of multi-skilled testers, and that 'technical' is a categorisation that falls significantly short in that context. When discussing the skills of developers we do not try to impose such broad labels, we talk in a positive sense about the specific skills that individual developers possess. When discussing the various programming, scripting, analysis, database, operating system and other skills that testers may possess, it would be nice if the same courtesy was extended.

Copyright (c) Adam Knight 2009-2010

Tuesday, 9 November 2010

A confession - on assumptions

Hi everyone, my name is Adam, I am a software tester and I make assumptions.

Not much of a confession I admit, however assumptions are something of a dirty word in software testing. If not addressed face on they can become hidden problems, rocks just under the surface waiting to nobble your boat when the tides change.

As a tester I am constantly making assumptions. This is an unfortunate but necessary part of my work. Where possible I always try to avoid assumptions and drive to obtain specific parameters when testing. Sometimes, particularly early on in a piece of development, it is not always possible to explicitly scope every aspect of the project. In order to avoid "scope paralysis" and put some boundaries in place in order to progress with testing, it is sometimes necessary to make assumptions about the required functionality and the environment into which it will be implemented and used.

These assumptions could relate to the users, the implementation environment, application performance or the nature of the functionality. e.g.:-
  • It is assumed that the customer all servers in a cluster will be running the same operating system

  • It is assumed that the user will have familiarity of working with database applications and related terminology

  • It is assumed that the customers will have sufficient knowledge to set up a clustered file system so our installation process can be documented from that point onward

  • Given a lack of explicit performance criteria it is assumed that performance equivalent to similar functionality will be acceptable

  • It is assumed that the function will behave consistently with other functions in this area in terms of validation and error reporting

I don't see anything wrong in making assumptions, as long as it is identified that this is what we are doing. As part of our testing process I encourage testers in my organisation to identify where they are making assumptions and to highlight these to the other stakeholders when publishing the agreed acceptance criteria for each story. In this way we identify where assumptions have had to be made and allow these to be reviewed and the safety and the risks involved in making those assumptions to be assessed. We identify implicit assumptions and expose them as explicit constraints, gaining confirmation from the product owner and/or customer to provide ourselves with confidence that the assumptions are safe.

Despite this process of identification and review, I recently encountered an issue with a previously made assumption. This highlighted the fact that simply identifying and reviewing assumptions during the development of a piece of functionality is not sufficient. Once you have made an assumption during the development of a function, in essence you remake that assumption every time you release that same functionality in the future until such time as:-

  • You cease to support the functionality/product

  • No more function, no more assumption - job done

  • You change the functionality and review the assumptions at that point

  • At this point I encourage my team to re-state any assumptions made about the existing functionality for re-examination. A recent example involved our import functionality. As part of an amendment to that functionaliy the tester stated the assumption that an existing constraint on the import data format would apply in the case of using the amended software. We questioned this and, after conferring with the customer, established that it was no longer a safe assumption given the way that they wanted to implement the new feature. In this way the explicit publishing and examination of a long held constraint helped to avoid a potential issue that would have affected the end customer.

  • You get bitten because the assumption stops holding true.

  • This last alternative happened to me recently. As part of a functional development a couple of years ago some assumptions were explicitly stated in the requirement regarding the nature of the data used in that function. Over the course of the next two years the customer base was extended and the range of data sources for the functionality extended. As no extensions to the functionality appeared necessary to support the new use cases, no further developments were done and the assumptions were not revisited. The environment in which the product was being used changed, rendering the assumption invalid and resulting in a issue with a specific set of source data. The problem that manifested itself was very minor, actually resulting from a problem in the application that the data was sourced from, but it did highlight the dangers involved in making assumptions and not reviewing them. I've since altered the way in which assumptions are documented during our development process to allow for easier identification and review in future.

Assumptions are easy to make. They are even easier to remake, every time the feature in question is re-released. Identifying and confirming the assumptions at the point of making them is a good step, but it is still a risky approach. Assumptions are static in nature and easy to forget. Customer environments, implementation models and usage patterns change much more quickly and forgotten assumptions can become dangerously redundant if not constantly reviewed. I'll be improving my process of assumption documentation, examination and re-examination in coming weeks. Is this a good time review what assumptions you've made in the past that are still being made - it may do you some good to stand up and confess.

Copyright (c) Adam Knight 2009-2010

Monday, 11 October 2010

Be careful what you wish for - on certification and recruitment

This post is a summary of some responses I made on a Software Testing Club discussion on certification. Specifically my responses focused on the pursuit of certification in order to gain access to job opportunities that list that certificate on the pre-requisites.

I've personally never felt the need to get certified, the only time that this has ever had an impact on my career in testing was when, as part of an ISO 9001 audit, the auditor asked if all of the team I was working in were "professional testers" meaning certified. My response was suitably glib - "no I'm just a paid amateur". I went on to take over the running of that team and successfully lead the testing strategy in that organisation for two years.

Certification on a large scale must by its very nature must impose a series of lists and limitations in order to provide some consistent measurement against which the certification can be granted. Testing, on the other hand, is a challenging and varied career in which every role has unique demands. Even apparently similar applications can require very different testing approaches depending on where the stakeholders' priorities lie. The ability to understand those priorities, assess risks and test in a focused yet creative and open minded manner is what I believe distinguishes the great testers from the good.

Sadly a big factor in whether and how to get certified will depend on if you want to work for the sort of organisations that value these certifications. Testing jobs are often advertised which specify certifications (usually ISEB/ISQTB) as requirements for the role, so it makes sense to obtain the certification to open up these opportunities to you, right?

Not necessarily. I would begin by asking yourself what kind of tester you want to be. An organisation that demands certification in its testers may have a very well defined and structured approach, but is it likely to be the kind of environment that understands and places a high value on free thinking, creativity and craftsmanship as qualities in its testers? If those are the type of qualities that you possess and want to develop, then ask yourself whether certification is likely to help you to develop in a way that you find rewarding.

I certainly wouldn't suggest discarding a job on the basis of them asking for certification in the job spec, some great opportunities could be missed that way. I would, however, make some efforts to understand the reasoning behind the request.

As Michel and Rob Lambert pointed out in the Software Testing Club discussion, it may be that the request arises from certification providing an apparent safety net for a recruiter with little other knowledge of the subject of testing. I suggested that, in this case, maybe a paragraph in your CV or covering letter along the following lines might help:-

I am a passionate and committed software tester. I am a member of the Software Testing Club and regularly read articles and books on the subject. In particular like the books and blogs of the following test authors (insert names here). While I do not have certification from any one specific organisation or certification body, I am familiar with concepts covered by the following courses/syllabuses (insert names here).

I would, however, be suspicious of working for an organisation who were unable to see around a lack of certification in an otherwise strong applicant.

  • If the role is a junior one, why are they not prepared to spend the money certifying enthusiastic applicants themselves (I believe foundation level ISQTB certification can be achieved in a 3 day course)?

  • If it is a more senior role, do they value your skills and experience? What does certification tell them that a track record of successful projects completed to the satisfaction of the stakeholders does not?

  • Is the candidate going to enjoy the autonomy to adopt their own approach, methods and tools in the role that they deem most appropriate to the context of the project?

There can be few things more frustrating than working in an organisation that does not recognise the complexity and value of the role that you are doing. In my experience software testing is particularly prone to this problem. If the recruiter is working on the basis that having passed e.g. a 3 day ISQTB course is sufficient evidence that someone is capable of fulfilling the demands of the role, I personally would take the time to look behind the request to establish what value they actually place on role of software testing in their organisation.

Copyright (c) Adam Knight 2009-2010

Friday, 17 September 2010

A Set of Principles for Automated Testing

The act of introducing new members to the team can act as a focus for helping the existing members to clarify their approach to the job. One of the things that I developed to work through with new members is a presentation on the automation process at RainStor, and the principles behind our approach. This post explores these principles in detail and the reasoning behind them. Although these have grown very specifically to our context (we write our own test harnesses), I think that there are generally applicable elements here that merit sharing.

Separate test harness from test data

The software that drives the tests, and the data and metadata that define the tests themselves, are separate entities and should be maintained separately. In this way the harness can be maintained centrally, maintained to reflect changes in the system under test, and even re-written, without having to sacrifice or risk the tests themselves.

Users should not need coding knowledge to add tests

Maintenance of test data/metadata should be achievable by testers with knowledge of the system under test, not necessarily knowledge of the harness technology.

Tests and harnesses should be portable across platforms

Being able to use the same test packs to execute across all of our supported platforms gives us an instant automated acceptance suite to help to drive platform ports and them continue to provide an excellent confidence regression set for all supported platforms.

Tests are self documenting

Attempting to maintain two distinct data sources in conjunction with each other is inherently difficult. Automated tests should not need to be supported by any documentation other than the metadata for the tests themselves, and should act as executable specifications for the system to describe behaviour. Test metadata should be sufficient to explain the purpose and intention of the test such that this purpose can be maintained should maintenance be required on that test.

Test harnesses are developed as software

The tests themselves are a software product that serves the team, and developments should be tested and implemented as such.

Tests should be maintainable

Test harnesses should be designed to be easily extensible and maintainable. At RainStor harnesses consist of a few central driving script/code modules and then individual modules for specific test types. We can add in new test types to the system by the dropping in of script modules with simple common inputs/outputs into the harness structure.

Tests should be resilient to changes in product functionality

We can update in the central harness in response to changes in product interfaces without the need to amend the data content of thousands of individual tests.

Tests allow for expected failures with bug numbers

This can be seen as a slightly contentious approach, and is not without risk, however I believe that the approach is sound. I view automated tests as indicators of change in the application. Their purpose is to indicate that a change has occurred in an area of functionality since the last time that that function underwent rigorous assessment. Rather than having a binary PASS/FAIL status, we support the option of having an result which may not be what we want but is what we expect, flagged with the related bug number. In this way we can still detect potentially more serious changes to that functionality. In this way we are maintaining the tests purpose as a change indicator, without having to re-investigate every time the test runs, or turn off the test as a failing test.

Tests may be timed or have max memory limits applied

As well as data results, the harnesses support recording and testing against limits of time and system memory that will be used in running a test. This helps in driving performance requirements and identifying changes in memory usage over time.

Tests and results stored in source control

The tests are an executable specification for the product. The specification changes with versions of the product, so the tests should be versioned and branched along with the code base. This allows tests to be designed for new functionality and performance expectations updated whilst maintaining branches of the tests relevant to existing release versions of the product.

Test results stored in RainStor

Storing results of automated test runs is a great idea. Automated tests can/should be used to gather far more information that simple pass/fail counts (see my further explanation on this here). Storing test results, timings and performance details in a database provide an excellent source of information for:-
* Reporting performance improvements/degradations
* Identifying patterns/changes in behaviour
* Identifying volatile tests

As we create a data archiving product, storing the results in here and using this for analysis provides the added benefit of "eating our own dog food". In my team we have the longest running implementation of our software anywhere.

These principles have evolved over time, and will continue to do so as we review and improve. In their current form they've been helping to drive our successful automation implementation for the last three years.

Copyright (c) Adam Knight 2009-2010

Monday, 13 September 2010

Professional Washer-Upper

A recent discussion on the Yahoo Agile Testing discussion group covered the subject of whether a separate testing role was still required in development teams practising TDD/BDD. Here I aim to use an example from a very different field to examine the benefits of both generalising specialists and having individuals devoted to even basic roles.

Professional Washer-upper

When I was at high school I had a weekend job washing dishes at a busy local restaurant. The job involved a number of responsibilities

  • operating the dishwashers for the crockery

  • Keeping the kithchen stocked with crockery to ensure orders could go out

  • Manually supporting the chefs in washing pans and cookware to meet demand

  • Operatng the glass washer and keeping the bar stocked with glasses to meet demand

I also, when required, could step in for others to help with their tasks
  • serve behind the bar (barman)

  • serve food (server)

  • clear tables (server)

  • make desserts (server)

  • cook starters (sous chef)

Similarly, other members could step in and cover each others jobs when required e.g. servers worked the bar early in the evening in the rush before most people were seated. On midweek nights, the restaurant was quieter and my tasks were shared among the servers and chefs. On very busy nights (e.g. New Year) we drafted people in to help with my tasks so that I could take on of the workloads of others. I had a number of relevant skills and could operate in a number of roles, yet if you asked anyone in the restaurant (including me), my job wash Washer-upper as this was my primary role and provided sufficient work to merit a devoted individual.

The restaurant could have adopted the approach of not having a washer-upper. The work would still have needed doing, but could have been fulfilled by other members of the team (e.g. every server washing all trays he/she cleared, all chefs washing their own pans). I was, however, very good at washing up. I knew what needed to be done to meet the needs of the rest of the team and how often. I knew the environment and had optimised my approach within it to the extent that it took 3 servers to cover when I was called off to other jobs. Given that someone was constantly required to be washing up, it made sense to have an individual devoted to that job who was better at it than the other team members.

The multi-skilled team

I think this example is a great case of a multi-skilled team of what Scott W Ambler calls Generalising Specialists or as Jurgen Appelo calls them T-shaped people. For low workload situations the number of individuals is reduced and the coverage of roles distributed across them. For more intensive workloads the benefits of having generalising specialists become apparent. Each individual has a key area of responsibility, however has the knowledge to step in and cover other roles as the pressures and bottlenecks ebb-and flow through the course of an iteration (sitting).

The benefits of devoted attention

Much as the many aspects of the washer-upper's position, the banner of Software Tester for the purposes of discussions such as the recent one on the Yahoo Agile Testing group, can be viewed as a matrix of roles and responibilities (which I feel is growing, not shrinking, but that's another topic). Some teams will operate by sharing these roles and responsibilities across the team without individuals assigned to the testing position, and will be successful. The testing roles, however, will still be present and need to be filled.

The question posted recently was whether TDD or ATDD/BDD will render the traditional testing role redundant. I don't think so. If a job as simple as the washer-upper can demonstrate the benefits of having skilled individuals concentrating on maximising effectiveness in an area of responsibility, then this benefit is only going to be amplified as the difficulty and complexity of the role increases. Having individuals with specific testing expertise whose primary concern is on this subject area has certainly payed dividends in my organisation, where the effectiveness and scale of testing performed (and consequentially knowledge and confidence) are far greater now than were experienced when reliance was far more on developer led testing.

As to whether it is sufficient for a tester to have only testing skills and responsibilities, that is another question for another post.

Copyright (c) Adam Knight 2009-2010

Friday, 30 July 2010

The most important bug

At the recent UKTMF quarterly one of the session leaders posed the question "What it the most important piece of information that you want to see in a testing project?".
Number of Bugs was a statistic that was suggested by a few, with more than one respondent in the room suggesting bug cut-offs such as:-

When we hit fewer than 20 P1 bugs

When there are no P1 bugs and less than 20 P2s

It appears that the approach taken in the organisations that these people worked in was to assess the release fitness of the product based on the number of known bugs of a certain priority in the release software, and consider it to be of acceptable quality once this threshold had been achieved.

This is not an approach that I choose or recommend, for a number of reasons that I thought I'd outline here:-

1. The cutoff point is arbitrary

I would be pretty confident in asserting that the individual that decided that less than 20 P1 bugs was acceptable release quality had no empirical evidence to back up the fact that this figure provides us with significantly greater customer quality than 15, or 21, or 25. Despite this they are placing a huge amount of emphasis on that boundary bug that needs to be fixed in order to achieve the release target. This may result in a disproportionate amount of effort going into resolving one or two issues to hit a target one one hand, or on the flipside there may be a tendency to ease up on effort to fix the remaining issues for a release once the magic number has been achieved.

2. Assumption that bugs of the same priority as equal

Most folks have 4 or 5 bug priorities in their tracking systems, the more avanced may separate priority from severity. In either case we are grouping a wide variety of issues under very broad priority categories. Can we honestly say that all P2s are equal? What happens if we are in the last stages of a release project and we have 21 P1 bugs outstanding? If it is possible to achieve release quality through resolving just one of these issues then our decision process over which one to resolve is simple, we target the simplest to fix, with the least retesting required and the lowest risk of regression issues. From a customer quality perspective what we should be doing, however, is concentrating on resolving those issues that are in the core functionality in the critical paths of our application as it is these issues that pose the larger risk to our customers.

3. Removes responsibility on quality decisions

Imposing a measure of acceptable quality removes the responsibility from the decision makers from actually looking at the issues in the system and using their judgement to assess whether or not to release the software. In reality quality is a very difficult thing to define, let alone measure, and the testers role should be one of information provider to feed as much information into the decision making process as possible. By reducing this information to a set of bug priorities then you are essentially placing the decision on release quality in the hands of the tester who assigns the priorities. The decision about whether software is fit for release merits more management involvement than reviewing a four column bar chart.

4. Bug severities are subjective and movable

This is a double edged sword as it does allow for some human judgement to bring flexibility into a very black-and-white process. On the downside, however, it does introduce the temptation of re-prioritising issues in order to bring the product in under the quality radar. When we are considering bug summary statistics rather than bugs themselves for our quality measure then we introduce the possibility of hiding issues with re-prioritisation.


Bug priorities are there as a simple guideline, not an absolute measure. We should treat each issue on its own merit rather than masking details behind statistics. A review of individual bugs gives a far greater understanding of the current status of the system than summary of bug statistics ever could. This will lead to a far more informed decision making process at release time than when this information is abstracted behind a set of bug statistics, particularly if the individuals in the process have a good understanding of their customers' and the qualities that matter to them.

Copyright (c) Adam Knight 2009-2010

Friday, 21 May 2010

The Kitchen Sink - why not all tests need automating

I've recently been working on testing a Windows platform version of our server system. A major part of that work was to port the test harnesses to work in a Windows environment. I'd completed the majority of this work, and with most of the tests running, checked and passing, I began to tackle the few tests remaining that were not simple to resolve. After the best part of a day struggling to get very few tests working I decided to take a step back and review what the remaining tests were atually trying to do. I very quickly decided that the best approach was to get the tests working after all but rather to remove them from the test suite.

For example, the first one that I looked at tested an error scenario where a system function was called where permission had been revoked from the input file to that process. Although a valid test, the likelihood of this issue occuring in a live environment was slim and the potential risk to the system low. The test itself, on the other hand, involved bespoke scripts within the test with high maintenance requirement when porting and a high risk of failure.

I contacted the tester who created the test and put it to him that this type of test was possibly more suited to initial exploratory assessment of the functionality involved rather than full automation and repeated execution. He accepted this and we agreed to remove the test.

I took this as an opportunity to review with the team what tests needed adding to the regression packs and when. Some of the key points that should be considered:-

  • Once a test has passed, what is the risk of a regression occurring in that area?

  • How much time/effort is involved in developing the test in the first place compared to the benefit of having it repeatable?

  • Is the likelihood of the test erroring in itself higher than the chance of it picking up a regression?

  • Will the test prove difficult to port/maintain across all of your test environments?

Just because we can automate a test doesn't mean that we always should. Aim to perform a cost/benefit analysis of having the test in your automation arsenal versus the cost of running and maintaining the test. It may becomes apparent that the value of the test is less than the effort it takes to develop, execute and maintain. In this situation then the best course of action may be to execute manually as an exploratory test in the initial assessment phase, and focus our automation efforts on those tests that give us a bit more bang for our buck.

Copyright (c) Adam Knight 2009-2010

Thursday, 13 May 2010

Why automated testing is more than just checking

For my first post in a while (after a period of intensive DIY - house is looking great, blog is a bit dusty!) I thought I'd expand on a point I have made a few times recently in various forums. In discussion further to my post Testing the Patient I commented that I would not start referring to automated tests as checks, because my tests gather more information than is required to execute the checks. Here I expand on this to provide some examples of information that can be usefully gathered by automated tests over and above the information needed to feed checking mechanisms.

Test results

This sounds a bit obvious but if a check is being performed against a file or set of information then it makes diagnosing issues much easier if the actual data being checked is available after the fact, and not just the result of the check. Knowing that the result of a test was 999 provides far more information than knowing it failed a check criteria that the result should not exceed 64.

Also for file/data driven tests such as many of those used at RainStor we apply conversions to remove inconsistent/random information from test results prior to comparison against expected results. Storing the original result as well as that modified for comparison allows for valuable de-risking investigation to be performed to ensure that you are not masking issues through the process of modifying results.

Application Logs

If the application has a logging mechanism it is useful to copy the logs associated with a test run and store with the results of the run. These can be an excellent diagnostic tool to assist the tester in investigating issues that may have arisen in the tests, even after the test environment may have been torn down.

System logs

Tracking and storing operating system logs and diagnostics such as /var/log/messages or the Windows event log is again a great aid towards examining application behaviour and investigating potential issues after the tests have completed. This information is not only useful for debugging but also to check for potential issues that may not have been picked up by the specific check criteria in the tests themselves.

Test timings

Many of my tests relate to specific performance targets and so have associated timings that form part of their success criteria. Even for those that don't, storing the time taken to run each test can provide some very useful information. Some of the automated tests that I run do not perform any checks at all, but simply gather timings which I can use to model application performance through the course of the test and identify unhealthy patterns or likely points of failure.

Also if, as I do, you store a history of test results in a database then this allows you to graph the changing behaviour of the test over time through multiple iterations of the software and identify patterns or key dates when behaviour changed.

Application Memory Usage

As with test timings, some of my tests have a specific memory criteria which is checked, however for the majority of tests I log the system memory usage of the application through the test. Again by storing this information in a database we can track the memory behaviour of the system over time and identify key dates when memory issues may have been introduced. Knowing the memory usage of the application can also be a valuable piece of information when debugging intermittent faults under high load.

Test Purpose

Not strictly something that is gathered by the test, but storing the test purpose in the test results makes it much easier for anyone investigating the cause of test failures, especially if this is not the person who wrote the tests.

Test Application Logs

Most successful test automation implementation are the ones that view their test harnesses as products in their own right, and in that regard these should generate their own application logs which can also be stored with the results of a test run. Should unexpected results arise from the tests these logs provide a wealth of information on the activities carried out by the test application/harness and any issues encountered which may have been the cause of the exception.

If you have gone to the effort of implementing a suite of automated tests then a lot of the hard work is done. Take the time to consider how you can use the power of these tests to do more than just checking and instead reap a wealth of useful information that can be used to assist in and improve your testing process.

Copyright (c) Adam Knight 2009-2010

Thursday, 11 March 2010

The power of the tester in requirement meetings

It is my firm belief that Testers should get involved in the development of requirements at the earliest opportunity. In a recent discussion with Gojko Adzic he pointed out the similarities in the process we use at RainStor and that of Chris Matts model of Feature Injection. I have had some occasions recently which provide excellent examples of the presence of a Tester making an invaluable contribution to requirements discussions for the purposes of Breaking the Model.

Scenario 1 - Tester, Developer and Product Manager Discuss a deletion requirement

Product Manager (pm): The requirement is to support the physical removal of items from the system

dev: Great, we've got an existing model for a logical delete using a system query to identify the items to delete. We can use the same model to identify items to physically delete.

tester: What happens in the situation where you've already logically deleted an item? It would not then be visible to the user to query so they wouldn't know that they needed to physically delete that item so the model won't work.

dev: Oh yeah.

The interesting point here was that the developer was tempted to fit the new functionality into an existing model based on current functionality. By breaking the model the tester highlighted the need for a new model, that was not based on existing functionality, and further discussion with the customer.

Scenario 2 - Product Manager, Developer and Tester Discuss a legal hold requirement

pm: The customer wants to be able to flag an object as on hold or not. This is to to prevent it from being deleted if a person or company is involved in a legal investigation.

dev: This just requires a simple boolean flag to identify the items that are held. The easiest way to tackle this is if the item is held then we'll prevent that entire container for that item from being expired or logically deleted

tester: Didn't we have a requirement to physically delete stuff for legal reasons? What happens if the same container contains 1 item that needs deleting and 1 that is held.

dev: Oh yeah, OR for that matter, what happens if the same item is involved in two investigations, once the first is closed the boolean hold flag could be switched off, even though there is still an active investigation. We need a reference count instead.

pm: That sounds better

tester: A reference count would mean that unsetting the flag for a set of items could not be re-applied in the case of failure or cancellation, as you could accidentally dereference the same item twice.

dev: Oh yeah - we need some kind of case ID to apply the hold under, then we can apply and remove based on the case reference, then this would never impact any other cases that have that same item on hold.

pm: Good thinking - I'll run this by the customer.

What I found particularly interesting in this case was that, the original discussion was based around implementing what the customer had asked for. Once the tester had started to introduce the idea of breaking this model, the developer actually went on to identify further examples that highlighted flaws in the original request. In this case the presence of a tester raising issues with the proposal shifted the focus of the session. Instead of discussing how to deliver the customer requirement, the focus moved to finding a model that delivered the business value they needed. This was a good example of challenging requirement, a subject which Gojko presented on at the Agile Testing and BDD Exchange last year. The result was a far more robust model for implementation, and one which addressed a series of issues which the customer had not considered themselves.

In both examples the value of having the Tester present in the early discussions around the requirement were clear. The product manager and developer gained the benefit of the expertise of the tester in critiquing proposed models before committing to a solution. The tester gained an insight into the customer requirement that was not biased by the developers having already started work on a specific solution.

Copyright (c) Adam Knight 2009-2010

Friday, 5 March 2010

Finding the right balance

What should one look for when recruiting for a test team?
Do you take the company profile of "Software Tester", which probably reads something like:-

  • Bachelors degree in computer science or a scientific/mathematical subject
  • 2 or more years experience of software testing
  • Experience of automated testing tools
  • Experience of your documentation tool of choice
  • ISEB foundation certificate

and hand this to the recruitment agents?

Personally I try to avoid this approach. I see little benefit in populating a team with multiple versions of the same skill set. Instead I prefer to take the opportunity to put together a team who together have a range of skills, the members of which can all make valuable and different contributions to testing the product.

In order to do this, the first question that must be tackled is:-

What qualifies someone as a software tester?

My answer would be:-

Anyone who can make an educated assessment of the value of the system under test to a stakeholder of that system, or anyone who can increase the ability of the team as a whole to achieve that assessment.

The first category could contain a number of different abilities such as:-

1. Experience in software testing, particularly working with the technologies in question
2. Experience of working in the industry or business area in which the product is targeted
3. Anyone with experience of using the systems and tools (rival or complementary) which the targeted users of the system will have experience of.
4. Experience of administering the environments, operating systems or types of network within which the system will operate
5. Anyone with experience of working in the legal or legislative environment in which the system will operate

The second may contain abilities such as:-

1. The ability to create and maintain automated test harnesses/suites
2. The ability to configure test environments

Now, clearly if we are putting a team together to critically assess and automate ongoing checking of a system then again just putting together a team of testers with specific domain knowledge but no ability to automate or maintain test infrastructure then that is not going to achieve the team goals. What we require is a team whose combined skills will achieve the goals of the team as a whole.

Creating a skills matrix to identify needs

My approach to identifying the skills required to build such a team revolves around the creation of a skills matrix:-

First I create a list of skills which I feel are relevant to the team and the work that we have to do. I generally group these under headings such as "Testing Skills", "Domain Knowledge", "Test Automation" or similar. I will obtain the input of the current team members on this as well. The result will be a list of skills E.g.
Test Scripting
Team Admin
Test Automation
Performance Testing
Multi-User/Load testing
Web Testing
Test Technical
Unix Shell scripting
Windows Shell scripting
Dot Net
Database Related
SQL Server

Then, I rate these skills in terms of importance to the team. I use 2 ratings, both on a scale of 1-3:-

- Relevance - How relevant this skill is to the team
- Weight - how frequently the skill is required. This essentially translates to the number of team members that are required to possess the skill

Then I rate each member of the team according to their ability. I do this in discussion with each individual, according to the following ratings:-

0 - I would not know where to start doing/using this
1 - I would need help if asked to do/use this
2 - I would be able to do/use this on my own but am no expert
3 - I would be able to do/use this without any help and know what I was doing

By putting these ratings into a spreadsheet and using simple formulae of

total score=SUM(team ratings) - (relevance X weight)
peak score=MAX(team ratings) - (relevance)

The Total column gives us a score relating to the coverage of that skill in our team relative to the weight. The "Max" column is simply the difference between the relevance of the skill and the ability of the highest scoring team member, thus identifying if there is a general ability in the team but we may be lacking a specialist.

A score of zero and above will indicate adequate coverage, below zero indicates a need in the team. I appreciate that the scoring system is arbitrary,it is a useful indicator of need. If it is not indicating useful information then I suggest adjusting the weighting/relevance ratings. The result of this analysis will be a matrix similar to:-
AreaSkillRelevanceWeightAlanBobCarl TotalMax
Test Scripting32333 30
Estimation32332 20
Test Automation32322 10
Performance Testing31222 3-1
Multi-User/Load testing21112 20
Web Testing12211 21
Domain Knowledge 0
SQL9232122 -1-1
ODBC Connectivity22313 31
Business Intelligence32120 -3-1

This provides a simple yet highly visible representation of the current position of the team and where the shortfalls are relative to where you want the team to be.

This approach to identifying the skill needs of the team is useful immediately for personal development plans, to see if the existing team members have an interest in learning or developing on one of the new skills.

The approach is especially useful at the point of recruiting new members to the team. Whereas the temptation is often to recruit based on a role "template" as described at the start of this post, an exercise such as this can sometimes reveal unexpected results in that the needs of the team lie in complementary areas to the traditional testing skills. Recently I recruited an excellent addition to my team who had very little testing experience at all. He did, however, possess system administration skills in the operating environments that our software is installed into, which gives him an excellent insight into the desirable characteristics of the system from the perspective of a valuable stakeholder, the system admin. Similarly another of the best testers in my team has a high level of specific domain knowledge in the area of databases, which is a desirable skill in critically assessing the behaviour of the product from the perspective of a DBA.

I would add the caveat that I have only used this on small teams so am not sure how well it scales, although on larger projects it could be interesting to apply the same approach recursively at the individual and team levels to identify team skills and needs as part of a larger group.

This approach takes some courage, as it requires looking at the combined skills of a team to meet the challenges faced rather than the skills of individual members. There may also be some push back from management, who prefer to see things simply by fitting individuals into role shaped compartments. However the benefits to the company that can be brought by putting together a multi-skilled team to address a multi-faceted task will pay dividends in the long term.

Copyright (c) Adam Knight 2009-2010

Friday, 19 February 2010

Dealing with difficult stories Part 3 - Epics

Having read a number of books and articles on Agile software developments, I find that most of the documented examples of user stories deal with the implementation of a new piece of functionality, often a pretty well encapsulated one at that that can be delivered in a single iteration. At RainStor I am working with an emerging product. Often our requirements do fit this form pretty well, however we also encounter many occasions when the work to be undertaken does not marry easily with this model of a user story.

In the third of my posts tackling this subject I address epic requirements. These are requirements where the minimum marketable feature is too large to complete during a single iteration. Here I present some of the approaches that can be taken in such situations, with the respective pros and cons that I can see or have experienced with these.

Tackle as a single requirement which persists from one iteration to the next

In my early days at RainStor this is the approach that tended to be taken.


+ Provides continuity through the development process


- It was difficult to track what had been delivered in each iteration
- The approach doesn't encourage early exposure to testing, whole iterations could be spent on development without any exposure to test or any elaboration or review.

From a test perspective this approach was not working and so, with some consultancy help from Dave Evans at SQS, I worked to replace the requirement based approach being used with a different approach:-

Breaking down large requirements into user stories that deliver customer value

This was not an easy transition. The previously used requirements generally had the appearance of being better defined (although the result usually deviated significantly from that definition). Working with much lighter user stories removed the feeling of false confidence that was provided by these requirements specs and some are still getting used to the lack of up front detail and the need for elaboration discussions to provide specification on the stories. Nevertheless a number of significant benefits were shown.


+ Smaller chunks of value can be delivered and tested within the iteration
+ Elaboration discussions held at each stage providing greater opportunity to review and change direction
+ It becomes easier to measure progress through stories completed
+ Work not done is highlighted quickly and can be addressed through the creation of further stories

There were still some negatives though

- Sometimes it was very difficult to create anything that could be exposed as valuable customer functionality within an iteration, particularly if one of the legacy components within the application needed changes to the interfaces.

We have had to amend our approach slightly to deal with these issues.

Breaking down large requirements into user stories that deliver stakeholder value

In most cases the stakeholder can be the customer, however for some situations we consider internal stakeholders to be the ones deriving the value from the story. For exmaple, where changes need to be made to the legacy components in order to add new product features then we would take the approach that the developer using that interface is the stakeholder, much as an external developer who uses our API would be.

+ Allows internal stakeholder value to be delivered and tested without forcing functionality to be exposed to the customer where it is not feasible
+ Maintains the approach of breaking down into deliverable chunks within the iteration
+ Value can be tested using tests at the appropriate level e.g. unit testing of the delivered components
+ We use the approach to cover initial prototyping, where the value is delivered in a set of acceptance criteria and a workflow that has been discussed, demonstrated and agreed with the customer prior to developing functionality
+ The approach can be extended to deliver other internal value as research, infrastructure work, code maintainability, test hardware provision and other items that relate to internal stakeholders

- Can be seen as a cop-out on delivering customer value

We are constantly refining the approach, but I am pretty happy with the attitude that is being shown with respect to these large requirements now by the team in general. Greater breakdown and consideration of multiple stakeholders, internal and external, is helping to ensure that we deliver the right value at the right time during large scale 'epic' developments.

Wednesday, 20 January 2010

Difficult to fit stories - part 2 : Platform Ports

Having read a number of books and articles on Agile software developments, I find that most of the documented examples of user stories deal with the implementation of a new piece of functionality, often a pretty well encapsulated one at that. At RainStor I am working with an emerging product. Often our requirements do fit this form pretty well, however we also encounter many occasions when the work to be undertaken does not marry easily with this model of a user story.

In the second of my posts tackling this subject I address platform ports. These are another type of requirement which I have encountered which can be difficult to breakdown in the accepted format of a user story.

Essentially a platform port can be viewed in two ways. Either:-

Re-applying all of the previous stories to a new operating system


+ Each story relates to valuable user functionality
+ Allows clear picture of how the port is progressing based on what stories have been delivered


- Porting rarely delivers functions one at a time, it is more often a case of addressing compilation issues and then delivering large areas of functionality (ie tens or hundreds of previously implemented stories) at a time leading to a logistical nightmare in managing which ones have been delivered.
- Failures can often be technology rather than function specific and so it can be hard to marry the bugs/blocks to the stories

Having one user story which equates to the user wanting to run the entire system on a new OS.


+ Provides one clear story which delivers value to the customer (as a X I want the system to work on platform Y so that I can Z)


- Ports are usually too large to fit into an iteration
- Little breakdown of the work involved which affords less opportunity for tracking of progress based on completed stories

Neither of these approaches worked in practice when tackling this type of requirement. Over time we have evolved a third way of addressing this type of requirement.

Defining stages which deliver value to stakeholders

The approach that we have settled on is a hybrid approach, breaking down the work into stages and grouping the corresponding deliverable value to stakeholders at each stage and the associated tests together. e.g.

  • The software will compile and build running unit tests on OS XX
    The value here being making the build available to the testers and automated test harnesses for further testing

  • Functional and SOAK testing will be executed on OS XX
    The value here being the confidence and product knowledge provided to the product management team reported from the tester/team. Not all tests need to pass first time but the software needs the be in a state sufficient to allow the full suite of tests to be executed

  • The software will be packaged for release and will pass clean machine installation and configuration tests on OS XX
    The value here being the deliverable software to the customer, or the 'brochure summary' requirement


+ Allows for stories which are testable and deliver value to stakeholder, albeit not always the customer
+ Blocks and issues are relevant to the story in question
+ The stories are manageable in both size and quantity and the appropriate testing clearly definable.
+ Performance stories can be defined separately depending if there are any set performance criteria for the OS (see previous post)

I still have some problems with this approach. It does feel a little more task based than value based, with a very strong dependency between the stories. It does, however, allow for the breakdown of a lengthy requirement over a series of iterations with valuable and testable deliverables from each story and a sense of progress across iterations. In the absence of a better solution, this approach is "good enough" for me.

Copyright(c) Adam Knight 2010

Friday, 15 January 2010

Dealing with Difficult to fit stories - Part 1: Performance

Having read a number of books and articles on Agile software developments, I find that most of the documented examples of user stories deal with the implementation of a new piece of functionality, often a pretty well encapsulated one at that. At RainStor I am working with an emerging product. Often our requirements do fit this form pretty well, however we also encounter many occasions when the work to be undertaken does not marry easily with this model of a user story.

In my next few blog posts I will outline some examples of such requirements and how we have attempted to deal with such scenarios. In this post I discuss performance:-

1. Performance

As RainStor provides a data storage and analysis engine, our requirements regarding performance of e.g. data load and querying are well defined. For new administration functions, however, the customer rarely has specific performance requirements, but we know from experience that they will not accept performance if they deem it to be unsatisfactory.

Some approaches that can be adopted to address this:-

  1. Define acceptable performance as part of each story

  2. Pros

    - Focuses attention on optimising performance during the initial design
    - Improves delivery speed if acceptable performance can be achieved in the iteration where functionality is implemented


    + Measuring performance often requires lengthy setup of tests/data which can take focus away from functional exploration and result in lower functional quality

  3. Have separate performance stories

  4. Pros

    + Allows you to deliver valuable functionality quickly and then focus on performance, "get it in then get it fast"
    + Having a specific story will focus time and attention on performance and help to optimise performance
    + Separating performance out helps provide focus on designing and executing the right tests at the right time.


    - Separates performance from functional implementation and can 'stretch' the delivery of functionality over multiple iterations.
    - Performance stories are likely to be prioritised lower than other new functionality, so we could end up never optimising and having a system that grinds to a halt
    - We have a delayed assessment of whether performance is very poor or will scale badly on larger installations

  5. Have a set of implicit stories or as Mike Cohn calls them, 'constraints', that can apply system wide and are tested against every new development

  6. Pros

    + Having these documented provides the tester with a measurable benchmark against which they can specify acceptance criteria


    - Constraints may be too generic and hard to apply to new functionality, and lead to a tendency to always perform at the worst of our constraint limits.
    - Alternatively we may end up specifying constraints for every requirement and end up in the same situation as option 1.
    - A new piece of functionality will usually take precedence over breaking a constraint, and so then you need to prioritise the work to bring the performance back up to within the constraint limit

  7. Define acceptable criteria for each story based on previously delivered functionality

  8. Pros

    + Uses our own expertise and experience on what is achievable
    + Also taps into our knowledge of what customer will find acceptable based on previous experience


    - May be difficult to find suitable functionality to compare against
    - We may accept poor performance if the function used to compare against is more resource intensive, or itself has not been optimised

In practice the option chosen in my organisation will vary depending on the situation. If the customer has no performance requirement, or lacks the in depth knowledge to assess performance in advance, then we tend to use our own knowledge of customer implementations to decide on performance criteria. This requires an excellent knowledge of the customers' implementations and expectations to make decisions on what is acceptable on their behalf.

In terms of when to test performance, where there is not an explicit performance requirement involved, we tend to obtain information on performance during the initial functional implementation. We will then discuss with Product Managers/developers and possibly customers whether this is acceptable and whether further improvements could be made, or whether further assessment is required. We will then prioritise further targeted performance work at that stage. This works well for us as it maintains a focus on performance without hindering the development effort, and allows this focus to be escalated in later iterations if we deem it to be a priority.