Why should we use dummy data instead of live data?

Dummy data is benign information that does not contain any useful data, but serves to reserve space where real data is nominally present. It can be used as a placeholder for both testing and operational purposes.

Dummy data must be rigorously evaluated and documented to ensure that it does not cause unintended effects.

There are obvious pro’s and con’s when using dummy data for testing.

Pro’s:

  • Easy to create a dummy set of data for testing as and when needed.
  • There is no need to obfuscate live data.
  • Testers can create the data they need without depending upon other teams.
  • A smaller data set can be created to test against where the testers know exactly what data exists (controlled sample).

Con’s:

  • Dummy data cannot fully replicate every single type of data that exists in production, thus defects could be missed.
  • Using a smaller data set means that load test results may not reflect the size of production data (web page/web service response times).
  • Processing times on a smaller dataset will not accurately reflect what will happen in production (e.g. on an Oracle Financials database).

Most organizations are still using live data in test and development environments because of a lack of awareness around data security, and they don’t know they can easily mask or de-identify sensitive data using off-the-shelf technologies without changing applications or testing processes.

Even when the awareness is there, organizations still tend to rely on real data for its speed and ease of use.

Using live, cloned data is generally regarded as a shortcut when there isn’t enough time or resources to create test data, or a secure test data strategy isn’t in place.

But these are not excuses for a practice that can put customer data in great jeopardy. It is true that in general these test systems are not Internet-accessible, but even if you have absolute trust in all your employees — never a good starting point — that doesn’t remove the risk, as many organizations will outsource parts of development and hire contractors, consultants, and the like.  And if the media has taught us anything over the last decade about carelessness, it’s that people often store this type of data on laptops and removable media devices, and those assets can get lost or stolen.

Beyond the insider threat, there’s also the very real possibility that malicious external hackers can eventually work their way deep enough into the network after a blended attack and get their hands on test applications and live data.

The biggest change in recent years is the legislation requiring live data to be obfuscated on pre-live environments.  The challenge is to replicate live issues on non-live environments, and to test on live-like data prior to releasing code to production. Failure to do so can lead to defects being uncovered in production, just due to a deficiency in the actual data or the volume of the data used on a test environment.

It’s a challenge but one that cannot be ignored. Either you use hand-crafted dummy data, or obfuscated live data – either way, you cannot just take live data and test it unchanged!

 

Posted in Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *

*