Preparing for the EU GDPR (Part 2)

Preparing for the EU GDPR (Part 2)

In this post I’m following up on the previous post regarding the General Data Protection Regulation (GDPR). I will focus here on the data protection strategies BizDataX solution supports to enable organizations comply with the GDPR.
When discussing data protection strategies with our clients they often raise a question of whether to use data anonymization or data pseudonymization concept to protect data in non-production environments, e.g. testing environments. The next question that usually comes up is: “Does BizDataX support both sensitive data protection concepts?” The short answer to the former question would be: “It depends on the scenario”, whereas the short answer to the latter question is clearly: “Yes, it does”. Let me elaborate on both.

Data anonymization process vs Data pseudonymization process

GDPR introduces pseudonymization as a technique to protect personal data. However, it does not mention data anonymization at all (it does mention the term ‘anonymous data’). After reading GDPR for the first time, one might think that data anonymization is not an appropriate way to protect data but it is quite the opposite. Data anonymization is, along with the generation of the synthetic data, the best method for data protection. By anonymizing data for certain environments and processes, organizations narrow the reach of the GDPR, lowering related data protection costs and risks. If you are able to process your personal data using data anonymization technique in a proper way, GDPR will not apply to your (processed) data. There are, of course, real world scenarios where data anonymization is not an option so another way to protect data would be to use data pseudonymization.
Data pseudonymization technique replaces data that could lead to a direct identification of a natural person with pseudonyms (aliases) but preserves the link between the pseudonyms and identifiers (original values belonging to the natural person including name, family name, email, credit card info etc.) in a separate data store. What you get when you pseudonymize data are basically two data stores, one data store with pseudonyms and other non-sensitive data and the other data store with the link between pseudonyms and the identifiers. In contrast to the data stores that are anonymized, pseudonymized data stores do contain real data, so there are no losses in terms of data quality. GDPR encourages organizations to use data pseudonymization because it does provide more protection for the personal data, but at the same time states that pseudonymization is “not intended to preclude any other measures of data protection”. Organizations that use data pseudonymization to process production data for non-production purposes are NOT exempt from the GDPR.

Let’s focus now on the testing environments. Testing process introduces a number of testing environments during the application development process: development, integration, system, performance, UAT etc. Testers and test automation tools require relevant test data, so the real question is do you need real production data or near-real (anonymized) data for each of the environments? In my opinion and experience, data anonymization (combined with the synthetic data generation) should be used over data pseudonymization in the vast majority of testing scenarios. Sometimes clients state the need to double check test outcomes with real data because they are used to test against the production databases and need real data to be sure that the results of the tests are correct. It is a valid argument for some scenarios, but data anonymization, because of the benefits it provides in terms of compliance and cost effectiveness, should always be considered first.
No matter what technique clients select, BizDataX solution can support them in their choice.

In case of data anonymization, BizDataX user will, using the familiar workflow based concept, apply a number of data anonymization activities on sensitive data assuring that re-identification of a natural person is impossible.
When data pseudonymization is the preferred option, BizDataX user will create an additional data store holding links between pseudonyms and the identifiers. Within the workflow, the content of each sensitive field (identifier) will be replaced by a pseudonym. The pair pseudonym-identifier will be stored in this separate data store that has to be safeguarded the same way other production databases are.
After the completion of the data pseudonymization process, standard users will only see pseudonyms instead of the real data. Authorized users will be enabled to re-identify natural persons by joining pseudonyms with identifiers from the protected data store.

BizDataX supports both the data pseudonymization and the data anonymization techniques when provisioning data for all kinds of processes including testing, knowledge management, BI, marketing, sales etc. If you want to use production data for non-production purposes and stay GDPR compliant at the same time, BizDataX has all you need to achieve your goals.

Vedran Brničević
Co-Owner, Member of the Board