Addressing Data Anonymization Challenges

Table of Contents Introduction Data Anonymization Challenges Removing explicit entities Data sampling Anonymization levels Semantic relationship Data Anonymization Requirements SOA Architecture Model Semantic dependencies analysis Data Anonymization Solutions CloverETL Infrastructure Banking System Anonymization Anonymization ETL Process Conclusion Acronyms and terms cloveretl.com [email protected] Introduction Production data covers an ideal use-case scenario for complex heterogeneous systems deployed in production environments in a certain time period. In today’s enterprise applications, use cases inherently stored in these systems are usually very complex. Complex Systems –...
Table of Contents Introduction Data Anonymization Challenges Removing explicit entities Data sampling Anonymization levels Semantic relationship Data Anonymization Requirements SOA Architecture Model Semantic dependencies analysis Data Anonymization Solutions CloverETL Infrastructure Banking System Anonymization Anonymization ETL Process Conclusion Acronyms and terms cloveretl.com [email protected] Introduction Production data covers an ideal use-case scenario for complex heterogeneous systems deployed in production environments in a certain time period. In today’s enterprise applications, use cases inherently stored in these systems are usually very complex. Complex Systems – Intricate use-cases Complexity introduces a general test data issue: how to get test data for new releases and updates. Unlike new system development, it’s necessary to pass plenty of tests, including functional tests, for new and changed functionalities; regression tests for existing functionality; and especially load and performance tests, ensuring a satisfying customer experience. Thus, finding enough reliable, high quality data is often a nightmare for the majority of the enterprise systems test managers. Synthetic Data – Not a Solution The obvious approach of generating synthetic data often does not satisfy the stringent criteria enterprise systems must meet, especially for regression and load-and-performance test needs. As previously mentioned, real complex and heterogeneous production use cases usually go far beyond the imagination of even the best senior business analyst, and they’re a common source of potential production issues related to change management. Synthetic test data can only satisfy small, isolated changes where regression and load-and-performance testing is not required. Production Data for Testing? Using production data for such complex testing seems like it would be the natural answer for most project managers. However, a number of problems immediately arise with such an approach. The most critical problems are privacy concerns and data security. Client and business process data are part of a corporation’s most valuable assets. Thus, extending access for the testing team to such data hugely increases overall security risks – revealing sensitive client information to unwanted eyes and affecting related security costs and procedures. In some businesses, there’s an additional impact on internal policy impacts too. In banking environments particularly, for example, a lot of employees choose to have premium internal banking accounts, as they often offer benefits, special interest rates, etc. Now suppose that such production data were available to the project team. Test analysts would be able to peek at sensitive information about colleagues’ wages, history of transactions, and more. If such information were revealed among bank employees, it’d seriously threaten the overall HR corporate policy. cloveretl.com [email protected]
Read more...