In recent weeks the focus of much of my testing work has been around the scalability of our system to larger and larger data archives. This is one of the greatest challenges that we face at RainStor. Given that our customers may be importing many billions of records every day and storing these for months or years we neither have the resources nor the time to perform a real time test of this kind of scenario in an iterative test environment.
We already have a number of techniques for scaling up test archives, however in the past all of these have still required the 'building' of data partitions, a process which requires importing, validating, structuring and compressing the data. We've got this process pretty quick but it still takes time.
At the start of the latest iteration I discussed the issue with our developers. I'm very lucky in that the developers in my organisation hold testing in very high regard and, for the most part, understand the types of problems that we face and help whenever they can. When I discussed the issue with them they identified a number of 'shortcuts' that they could implement into the import process which could help in a test capacity to bulk up data in the archive. These are showing some great benefits, but I still felt that we were going toe to toe with an issue which was bigger than we were, the system simply couldn't build data as quickly as I wanted it to.
Redefining the problem
In reality, I didn't need to build the data, I just wanted to add many data partitions into the archive. Simply copying them in was an option, however this would not give a realistic scale up of the process of committing the data including logs and auditing that I wanted to be present. On examination of the other interfaces that data could be validly imported into the system I realised that we could potentially utilise an exisiting function for export/import of a data partition from one archive to another. The reason we'd not used this before was that a partition could only be copied into an archive once. This was a limitation that could be worked around through the creation of a test harness process to 'rename' a replication file and resubmit. In this way I've been able to create a bulk import process that uses a valid system API to scale up data partitions in a fraction of the time taken to import. This weekend I scaled up 3 months worth of data imports into an archive in 2 days.
What is the point
So what am I trying to say? Simply this, sometimes going toe-to-toe with a big limitation on testability in your system will cost you dear and get you nowhere. Trying to populate large amounts of data through your standard inputs and teh associated business rules can be like banging your head against a wall. Rather than looking at what the system cannot do, look at what it can do and work with that. If the standard data input processes are proving to be limiting factors on your setup or scalability, look at any Backup/Restore, replication, export and import functions that are supported and see if these can be harnessed to meet your needs. These functions are based around system generated inputs and often bypass business rules and validation, making them a much more efficient mechanism for populating test data than working through the standard input mechanisms, but without the risk of writing the data directly into the system/database. If you are worried about whether these interfaces will provide you with a realistically populated database/system, then maybe it is time to test these a bit more before the customers need to use them.
Copyright (c) Adam Knight 2009-2010