A Simple Method for Limiting Disclosure in Continuous Microdata Based on Principal Component Analysis

Aida Calviño

doi:10.1515/jos-2017-0002

Abstract

In this article we propose a simple and versatile method for limiting disclosure in continuous microdata based on Principal Component Analysis (PCA). Instead of perturbing the original variables, we propose to alter the principal components, as they contain the same information but are uncorrelated, which permits working on each component separately, reducing processing times. The number and weight of the perturbed components determine the level of protection and distortion of the masked data. The method provides preservation of the mean vector and the variance-covariance matrix. Furthermore, depending on the technique chosen to perturb the principal components, the proposed method can provide masked, hybrid or fully synthetic data sets. Some examples of application and comparison with other methods previously proposed in the literature (in terms of disclosure risk and data utility) are also included.

References

Banu, R. and N. Nagaveni. 2009. “Preservation of Data Privacy Using PCA Based Transformation.” In International Conference on Advances in Recent Technologies in Communication and Computing, 439-443. Doi: http://dx.doi.org/10.1109/ARTCom.2009.159.10.1109/ARTCom.2009.159
Search in Google Scholar Back to article
Brand, R. 2002. “Microdata Protection through Noise Addition.” In Inference Control in Statistical Databases, edited by J. Domingo-Ferrer. Lecture Notes in Computer Science, 2316: 97-116. Berlin Heidelberg: Springer. Doi: http://dx.doi.org/10.1007/3-540-47804-38.10.1007/3-540-47804-3_8
Search in Google Scholar Back to article
Brand, R., J. Domingo-Ferrer, and J. Mateo-Sanz. 2002. Reference Data Sets to Test and Compare SDC Methods for Protection of Numerical Microdata. Deliverable of European Project IST-2000-25069 CASC. Available at: http://neon.vb.cbs.nl/casc (accessed August 2016).
Search in Google Scholar Back to article
Burridge, J. 2003. “Information Preserving Statistical Obfuscation.” Statistics and Computing 13: 321-327. Doi: http://dx.doi.org/10.1023/A:1025658621216.10.1023/A:1025658621216
Search in Google Scholar Back to article
Domingo-Ferrer, J. and U. Gonza´lez-Nicola´s. 2010. “Hybrid Microdata Using Microaggregation.” Information Sciences 180: 2834-2844. Doi: http://dx.doi.org/10.1016/j.ins.2010.04.005.10.1016/j.ins.2010.04.005
Search in Google Scholar Back to article
Domingo-Ferrer, J. and V. Torra. 2001. “A Quantitative Comparison of Disclosure Control Methods for Microdata.” In Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies, edited by P. Doyle, J. Lane, J. Theeuwes, and L. Zayatz. 111-133. Elsevier. Available at: https://www.iiia.csic.es/es/publications/quantitativecomparison-disclosure-control-methods-microdata (accessed August 2016).
Search in Google Scholar Back to article
Domingo-Ferrer, J. and V. Torra. 2004. “Disclosure Risk Assessment in Statistical Data Protection.” Journal of Computational and Applied Mathematics 164: 285-293. Doi: http://dx.doi.org/10.1016/S0377-0427(03)00643-5.10.1016/S0377-0427(03)00643-5
Search in Google Scholar Back to article
Drechsler, J. 2011. Synthetic datasets for statistical disclosure control: theory and implementation, volume 201. Springer Science & Business Media.10.1007/978-1-4614-0326-5
Search in Google Scholar Back to article
Duncan, G. and R. Pearson. 1991. “Enhancing Access to Microdata While Protecting Confidentiality: Prospects for the Future.” Statistical Science 6: 219-239.10.1214/ss/1177011681
Search in Google Scholar Back to article
Efron, B. and R. Tibshirani. 1993. An introduction to the Bootstrap. New York: Chapman and Hall.10.1007/978-1-4899-4541-9
Search in Google Scholar Back to article
Fienberg, S. 1994. A Radical Proposal for the Provision of Micro-Data Samples and the Preservation of Confidentiality. Technical Report 611, Department of Statistics, Carnegie Mellon University.
Search in Google Scholar Back to article
Hundepool, A., J. Domingo-Ferrer, L. Franconi, S. Giessing, E. Nordholt, K. Spicer, and P. de Wolf. 2012. Statistical Disclosure Control. Chichester, UK: John Wiley & Sons.10.1002/9781118348239
Search in Google Scholar Back to article
Jiménez, J., G. Navarro-Arribas, and V. Torra. 2014. “JPEG-Based Microdata Protection.” In Privacy in Statistical Databases, edited by J. Domingo-Ferrer. Lecture Notes in Computer Science, 8744: 117-129. Springer International Publishing. Doi: http://dx. doi.org/10.1007/978-3-319-11257-210.10.1007/978-3-319-11257-2_10
Search in Google Scholar Back to article
Jolliffe, I. 2002. Principal Component Analysis. New York, USA: Springer.
Search in Google Scholar Back to article
Kim, H., A. Karr, and J. Reiter. 2015. “Statistical Disclosure Limitation in the Presence of Edit Rules.” Journal of Official Statistics 31: 121-138. Doi: http://dx.doi.org/10.1515/jos-2015-0006.10.1515/jos-2015-0006
Search in Google Scholar Back to article
Liew, C., U. Choi, and C. Liew. 1985. “A Data Distortion by Probability Distribution.” ACM Transactions Database Systems 10: 395-411.10.1145/3979.4017
Search in Google Scholar Back to article
Mateo-Sanz, J., J. Domingo-Ferrer, and F. Sebe´. 2005. “Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata.” Data Mining and Knowledge Discovery 11: 181-193. Doi: http://dx.doi.org/10.1007/s10618-005-0011-9.10.1007/s10618-005-0011-9
Search in Google Scholar Back to article
Moore, R. 1996. Controlled Data Swapping Techniques for Masking Public use Microdata Sets. Technical report, U.S. Bureau of the Census, Washington, D.C. Available at: https://www.census.gov/srd/papers/pdf/rr96-4.pdf (accessed August 2016).
Search in Google Scholar Back to article
Muralidhar, K. and R. Sarathy. 2008. “Generating Sufficiency-Based Non-Synthetic Perturbed Data.” Transactions on Data Privacy 1: 17-33. Available: at http://www.tdp.cat/issues/tdp.a005a08.pdf (accessed August 2016).
Search in Google Scholar Back to article
Muralidhar, K., R. Sarathy, and J. Domingo-Ferrer. 2014. “Reverse Mapping to Preserve the Marginal Distributions of Attributes in Masked Microdata.” In Privacy in Statistical Databases, edited by J. Domingo-Ferrer. Lecture Notes in Computer Science, 8744: 105-116. Springer International Publishing. Doi: http://dx.doi.org/10.1007/978-3-319-11257-29.10.1007/978-3-319-11257-2_9
Search in Google Scholar Back to article
Oganian, A. and A. Karr. 2006. “Combinations of SDC Methods for Microdata Protection.” In Privacy in Statistical Databases, edited by J. Domingo-Ferrer and L. Franconi. Lecture Notes in Computer Science, 4302: 102-113. Berlin Heidelberg: Springer. Doi: http://dx.doi.org/10.1007/1193024210.10.1007/11930242_10
Search in Google Scholar Back to article
Pagliuca, D. and G. Seri. 1999. Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey. Esprit SDC Project, Deliverable MI-3/D2.
Search in Google Scholar Back to article
R Core Team. 2014. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Website: http://www.R-project.org/.
Search in Google Scholar Back to article
Raghunathan, T.E., J. Reiter, and D. Rubin. 2003. “Multiple Imputation for Statistical Disclosure Limitation.” Journal of Official Statistics 19: 1-16.
Search in Google Scholar Back to article
Rubin, D. 1993. “Statistical Disclosure Limitation.” Journal of Official Statistics 9: 461-468.
Search in Google Scholar Back to article
Sarathy, R. and M. Krishnamurty. 2002. “The Security of Confidential Numerical Data in Databases.” Information Systems Research 13: 389-403. Doi: http://dx.doi.org/10.1287/isre.13.4.389.74.10.1287/isre.13.4.389.74
Search in Google Scholar Back to article
Templ, M. 2008. “Statistical Disclosure Control for Microdata Using the Rpackage sdcMicro.” Transactions on Data Privacy 1: 67-85. Doi: http://dx.doi.org/10.18637/jss.v067.i04. 10.18637/jss.v067.i04
Search in Google Scholar Back to article
Woo, M., J. Reiter, A. Oganian, and A. Karr. 2009. “Global Measures of Data Utility for Microdata Masked for Disclosure Limitation.” Journal of Privacy and Confidentiality 1: 111-124. 10.29012/jpc.v1i1.568
Search in Google Scholar Back to article

A Simple Method for Limiting Disclosure in Continuous Microdata Based on Principal Component Analysis

Abstract

Paradigm

My account