A Hybrid Technique for the Multiple Imputation of Survey Data

Humera Razzak; Christian Heumann

doi:10.1515/jos-2021-0022

References

Arnold, B.C., and S.J. Press. 1989. “Compatible Conditional Distributions”. Journal of the American Statistical Association 84:152–156. DOI: https://doi.org/10.2307/2289858.10.2307/2289858
Search in Google Scholar Back to article
Allison P.D. 2002. Missing Data. Thousand Oaks. CA: Sage Publications. DOI: https://dx.doi.org/10.4135/9781412985079.10.4135/9781412985079
Search in Google Scholar Back to article
Abdella, M., and T. Marwala, 2005. “The use of genetic algorithms and neural networks to approximate missing data in database”. In Proceedings of the IEEE 3rd International Conference on Computational Cybernetics, 2005. 24: 207–212. DOI: DOI: https://doi.org/10.1109/ICCCYB.2005.1511574.10.1109/ICCCYB.2005.1511574
Search in Google Scholar Back to article
Ankaiah, N., and V.Ravi. 2011. “A novel soft computing hybrid for data imputation”. In Proceedings of the 7th International Conference on Data Mining (DMIN). Las Vegas. USA. Available at: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.217.7984&rep=rep1&type=pdf.
Search in Google Scholar Back to article
Akande, O., F. Li, and J. Reiter. 2017. “An empirical comparison of multiple imputation methods for categorical data”. The American Statistician 71: 162–170. DOI: https://doi.org/10.1080/00031305.2016.1277158.10.1080/00031305.2016.1277158
Search in Google Scholar Back to article
Andridge, R.R., and R.J.A. Little. 2017. “A Review of Hot Deck Imputation for Survey Non-response”. International statistical review 78(1): 40–64. DOI: https://doi.org/10.1111/j.1751-5823.2010.00103.x.10.1111/j.1751-5823.2010.00103.x313033821743766
Search in Google Scholar Back to article
Armina, R., A.M. Zain, N.A. Ali, and R. Sallehuddin, 2017. “A review on missing value estimation using imputation algorithm”. Journal of Physics: Conference Series 892(1). DOI: https://doi.org/10.1088/1742-6596/892/1/012004.10.1088/1742-6596/892/1/012004
Search in Google Scholar Back to article
Bengio, Y., and F. Gingras. 1995. “Recurrent neural networks for missing or asynchronous data. In Touretzky, D.S., Mozer, M.C. and Hasselmo, M.E. editors”. Advances in Neural Information Processing Systems 8: 95–401. MIT Press, Cambridge, MA. Available at: https://proceedings.neurips.cc/paper/1995/file/ffeed84c7cb1ae7bf4ec4bd78275bb98-Paper.pdf.
Search in Google Scholar Back to article
Barnard, J., and X. Meng. 1999. “Applications of multiple imputation in medical studies: From AIDS to NHANES”. Statistical Methods in Medical Research 8:17–36. DOI: https://doi.org/10.1177/096228029900800103.10.1177/09622802990080010310347858
Search in Google Scholar Back to article
Breiman, L. 2001. “Random Forests”. Machine Learning 45(1): 5–32. DOI: https://doi.org/10.1023/A:1010933404324.10.1023/A:1010933404324
Search in Google Scholar Back to article
Batista, G., and M.C. Monard. 2003. Experimental comparison of K-nearest neighbour and mean or mode imputation methods with the internal strategies used by C4.5 and CN2 to treat missing data. University of Sao Paulo. Available at: https://www.semanticscholar.org/paper/Experimental-comparison-pf-K-NEAREST-NEIGHBOUR-and-BatistaMonard/35346d559d1bcfdf27acff66267e8f1d67190f23.
Search in Google Scholar Back to article
Burton, A., D. G. Altman, P. Royston, and R.L. Holder. 2006. “The design of simulation studies in medical statistics”. Statistics in Medicine 25: 4279–4292. DOI: https://doi.org/10.1002/sim.2673.10.1002/sim.267316947139
Search in Google Scholar Back to article
Chung, D., and F.L. Merat. 1996. Neural network based sensor array signal processing. In: Proc Int Conf Multisens Fusion Integr Intell Syst. Washington. USA: 757–764. DOI: https://doi.org/10.1109/MFI.1996.572313.10.1109/MFI.1996.572313
Search in Google Scholar Back to article
Chandra, A., G.M. Martinez, W.D. Mosher, J.C. Abma, and J. Jones. 2005. “Fertility, family planning, and reproductive health of U.S. women: data from the 2002 National Survey of Family Growth”. Vital Health Stat 23: 1–160. Available at: https://pubmed.ncbi.nlm.nih.gov/16532609/10.1037/e414702008-001
Search in Google Scholar Back to article
Corsi, D.J., J.M. Perkins, and S.V. Subramanian. 2017. “Child anthropometry data quality from Demographic and Health Surveys, Multiple Indicator Cluster Surveys, and National Nutrition Surveys in the West Central Africa region: are we comparing apples and oranges?”. Global Health Action. DOI: https://doi.org/10.1080/16549716.2017.1328185.10.1080/16549716.2017.1328185549606328641057
Search in Google Scholar Back to article
Dunson, D.B., and C. Xing. 2009. “Nonparametric Bayes modeling of multivariate categorical data”. Journal of the American Statistical Association 104: 1042–1051. DOI: https://doi.org/10.1198/jasa.2009.tm08439.10.1198/jasa.2009.tm08439363037823606777
Search in Google Scholar Back to article
Gelman, A., and T.P. Speed. 1993. “Characterizing a joint probability distribution by conditionals”. Journal of the Royal Statistical Society Series B: Statistical Methodology 55: 85–188. DOI: https://doi.org/10.1111/j.2517-6161.1993.tb01477.x.10.1111/j.2517-6161.1993.tb01477.x
Search in Google Scholar Back to article
Graham, J.W., and J.L. Schafer. 1999. “On the performance of multiple imputation for multivariate data with small sample size. In R. Hoyle (Ed.)”. Statistical strategies for small sample research: 1–29.
Search in Google Scholar Back to article
Gulliford, M.C., O.C. Ukoumunne, and, S. Chinn. 1999. “Components of Variance and Intra class Correlations for the Design of Community-based Surveys and Intervention Studies: Data from the Health Survey for England”. American Journal of Epidemiology 149(9): 876–883. DOI: https://doi.org/10.1.1.565.7897.10.1093/oxfordjournals.aje.a00990410221325
Search in Google Scholar Back to article
Harel, O., and X.H. Zhou. 2007. “Multiple imputation: Review of theory, implementation and Software”. Statistics in Medicine 26: 3057–3077. DOI: https://doi.org/10.1002/-sim.2787.
Search in Google Scholar Back to article
Horton, N.J., and K.P. Kleinman. 2007. “Much ado about nothing: a comparison of missing data methods and software to fit incomplete regression models”. The American Statistician 61: 79–90. DOI: https://doi.org/10.1198/000313007X172556.10.1198/000313007X172556183999317401454
Search in Google Scholar Back to article
Honaker, J., G. King, and M. Blackwell. 2011. “Amelia II: A program for missing data”. Journal of Statistical Software 45(7): 1–47. DOI: https://doi.org/10.18637/jss.v045.i07.10.18637/jss.v045.i07
Search in Google Scholar Back to article
Hardt, J., M. Herke, and R. Leonhart. 2012. “Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research”. BMC Medical Research Methodology 12(1). DOI: https://doi.org/10.1186/1471-2288-12-184.10.1186/1471-2288-12-184353866623216665
Search in Google Scholar Back to article
Kohonen, T. 1995. Self-Organizing Maps. Springer. Heidelberg. Available at: https://www.springer.com/gp/book/9783642976100.10.1007/978-3-642-97610-0
Search in Google Scholar Back to article
Lazarsfeld, P.F. 1950. The logical and mathematical foundation of latent structure analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen, Studies in social psychology in World War II: Vol. 4. Measurement and prediction.Chap. 10: 362–412. Princeton, NJ: Princeton University Press. Available at: https://psycnet.apa.org/record/1951-03037-000.
Search in Google Scholar Back to article
Li, F., Y. Yu, and D.B. Rubin. 2012. Imputing missing data by fully conditional models: some cautionary examples and guidelines. Duke University Department of Statistical Science Discussion Paper: 11–24. Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.7010.
Search in Google Scholar Back to article
Little, R.J.A. 1988. “A Test of Missing Completely at Random for Multivariate Data with Missing Values”. Journal of the American Statistical Association 83(404): 1198–1202. DOI: https://doi.org/10.1080/01621459.1988.10478722.10.1080/01621459.1988.10478722
Search in Google Scholar Back to article
Little, R.J. 2018. “On Algorithmic and Modeling Approaches to Imputation in Large Data Sets”. Statistica Sinica. http://www3.stat.sinica.edu.tw/statistica/J30N4/J30N401/J30N401.html
Search in Google Scholar Back to article
Little, R.J.A., and D.B. Rubin. 2002. Statistical analysis with missing data (2nd edition.). New York: Wiley. Available at: https://www.wiley.com/en-us/Statistical+Analysis+with+Missing+Data%2C+2nd+Edition-p-9781119013563.10.1002/9781119013563
Search in Google Scholar Back to article
McLachlan, G.J., and D. Peel. 2000. Finite mixture models. New York: Wiley. DOI: http://dx.doi.org/10.1002/0471721182.10.1002/0471721182
Search in Google Scholar Back to article
Marseguerra, M., and A. Zoia. 2005. “The autoassociative neural network in signal analysis. II. Application to on-line monitoring of a simulated BWR component”. Annals of Nuclear Energy 32(11): 1207–1223. DOI: https://doi.org/10.1016/j.anucene.2005.03.005.10.1016/j.anucene.2005.03.005
Search in Google Scholar Back to article
Marwala, T., and S. Chakraverty. 2006. “Fault classification in structures with incomplete measured data using auto associative neural networks and genetic algorithm”. Current Science India 90(4): 542-548. JSTOR. Available at: www.jstor.org/stable/24088946.
Search in Google Scholar Back to article
Morris, T.P., R.W. Ian, and R. Patrick. 2014. “Tuning Multiple Imputation by Predictive Mean Matching and Local Residual Draws. BMC Medical Research Methodology 14 (1): 75. DOI: https://doi.org/10.1186/1471-2288-14-75.10.1186/1471-2288-14-75405196424903709
Search in Google Scholar Back to article
Murray, J.S., and J.P. Reiter. 2016. “Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence”. Journal of the American Statistical Association 111: 1466–1479. DOI: https://doi.org/10.1080/01621459.2016.1174132.10.1080/01621459.2016.1174132
Search in Google Scholar Back to article
Narayanan, S., J.L.Vian, J. Choi, M. El-Sharkawi, and B.B.Thompson. 2002. Set constraint discovery: missing sensor data restoration using auto-associative regression machines. In Proceedings of the international Joint Conference on Neural Networks (IJCNN): 2872–2877. DOI: https://doi.org/10.1109/IJCNN.2002.1007604.10.1109/IJCNN.2002.1007604
Search in Google Scholar Back to article
Oja, E., and S. Kaski. 1999. Kohonen Maps. Elsevier. Amsterdam. Available at: https://www.elsevier.com/books/kohonen-maps/oja/978-0-444-50270-4.
Search in Google Scholar Back to article
Oba, S., M. Sato, I. Takemasa, M. Monden, K. Matsubara, and S. Ishii. 2003. “A Bayesian missing value estimation method for gene expression profile data”. Bioinformatics 19: 2088–2096. DOI: https://doi.org/10.1093/bioinformatics/btg287.10.1093/bioinformatics/btg28714594714
Search in Google Scholar Back to article
Pyle, D. 1999. Data preparation for data mining. Morgan Kaufmann Publishers Inc. San Francisco. Available at: https://dl.acm.org/doi/book/10.5555/299577.
Search in Google Scholar Back to article
Pérez, A., R.J. Dennis, J.F. Gil, M.A. Rondón, and A. López. 2002. “Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia”. Statistics in Medicine 21: 3885–3896. DOI: https://doi.org/10.1002/sim.1391.10.1002/sim.139112483773
Search in Google Scholar Back to article
Quanli, W., M.V. Danial, J.P. Reiter, and H. Jigchen. 2018. NPBayesImputeCat: Non-Parametric Bayesian Multiple Imputation for Categorical Data. R package version 0.1, Available at: https://CRAN.R-project.org/package=NPBayesImputeCat.
Search in Google Scholar Back to article
Rubin, D.B. 1976. “Inference and Missing Data”. Biometrika 63: 581–590. DOI: https://doi.org/10.2307/2335739.10.2307/2335739
Search in Google Scholar Back to article
Rubin, D.B. 1987. Multiple Imputation for Nonresponse in Surveys. Wiley, New York. Available at: https://www.wiley.com/en-us/Multiple+Imputation+for+Nonresponse+in+Surveys-p-9780471655749.
Search in Google Scholar Back to article
Roth, P.L. 1994. “Missing data: A conceptual review for applied psychologysts”. Personnel Psychology 47: 537–560. DOI: https://doi.org/10.1111/j.1744-6570.1994.tb01736.x.10.1111/j.1744-6570.1994.tb01736.x
Search in Google Scholar Back to article
Rubin, D.B. 1996. “Multiple imputation after 18 + years”. Journal of the American Statistical Association 91: 473–489. DOI: https://doi.org/10.1080/01621459.1996.10476908.10.1080/01621459.1996.10476908
Search in Google Scholar Back to article
Raghunathan, T.W., J.M. Lepkowksi, J. van Hoewyk, and P.A. Solenbeger. 2001. “Multivariate technique for multiply imputing missing values using a sequence of regression models”. Survey Methodology 27: 85–95. Available at: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.405.4540.
Search in Google Scholar Back to article
Reiter, J.P., T.E. Raghunathan, and S. Kinney. 2006. “The importance of modeling the survey design in multiple imputation for missing data”. Survey Methodology 32: 143–149. Available at: http://www2.stat.duke.edu/~jerry/Papers/SM06.pdf.
Search in Google Scholar Back to article
Royston, P., and I.R. White. 2011. “Multiple imputation by chained equations (mice): Implementation in Stata”. Journal of Statistical Software 45(4): 1–20. DOI: https://doi.org/10.18637/jss.v045.i04.10.18637/jss.v045.i04
Search in Google Scholar Back to article
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org.
Search in Google Scholar Back to article
Sharpe, P.K., and R.J. Solly. 1995. “Dealing with missing values in neural network-based diagnostic systems”. Neural Computing and Applications 3(2): 73–77. DOI: https://doi.org/10.1007/BF01421959.10.1007/BF01421959
Search in Google Scholar Back to article
Schafer, J.L. 1997. Analysis of incomplete multivariate data. London: Chapman and Hall. DOI: https://doi.org/10.1201/9780367803025
Search in Google Scholar Back to article
Schafer, J.L. and J.W. Graham. 2002. “Missing data: Our view of the state of the art”. Psychological methods 7: 147–177. DOI: https://doi.org/10.1037/1082-989X.7.2.147.10.1037/1082-989X.7.2.147
Search in Google Scholar Back to article
Schlomer, G.L., S. Bauman, and N.A. Card. 2010. “Best Practices for Missing Data Management in Counseling Psychology”. Journal of Counseling Psychology 57(1): 1–10. DOI: https://doi.org/10.1037/a0018082.10.1037/a001808221133556
Search in Google Scholar Back to article
Si, Y., and J.P. Reiter. 2013. “Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys”. Journal of Educational and Behavioral Statistics 38: 499–521. DOI: https://doi.org/10.3102/1076998613480394.10.3102/1076998613480394
Search in Google Scholar Back to article
Templ, M., A. Andreas, K. Alexander, and P. Bernd. 2012. VIM: Visualization and Imputation of Missing Values. Available at: http://cran.r-project.org/web/packages/VIM/VIM.pdf.
Search in Google Scholar Back to article
Van Buuren, S. 2007. “Multiple imputation of discrete and continuous data by fully conditional specification”. Statistical Methods in Medical Research 16: 219–242. DOI: https://doi.org/10.1177/0962280206074463.10.1177/096228020607446317621469
Search in Google Scholar Back to article
Van Buuren, S. 2012. Flexible Imputation of Missing Data, London: Chapman and Hall/CRC. DOI: https://doi.org/10.1201/b11826.10.1201/b11826
Search in Google Scholar Back to article
Van Buuren, S., and K. Groothuis-Oudshoorn. 1999. Flexible multivariate imputation by MICE. TNO Prevention and Health. Leiden. Available at: https://stefvanbuuren.name/publications/Flexible%20multivariate%20-%20TNO99054%201999.pdf.
Search in Google Scholar Back to article
Van Buuren, S., and K. Groothuis-Oudshoorn. 2011. “mice: Multivariate imputation by chained equations”. R. Journal of Statistical Software 45(3): 1–67. DOI: https://doi.org/10.18637/jss.v045.i03.10.18637/jss.v045.i03
Search in Google Scholar Back to article
Van Ginkel, J.R. 2007. Multiple imputation for incomplete test, questionnaire and survey data. Ph.D. dissertation. Tilburg University. Department of Methodology and Statistics. Available at: https://pure.uvt.nl/ws/portalfiles/portal/839209/224433.pdf.
Search in Google Scholar Back to article
Vermunt, J.K., J.R. van Ginkel, L.A. van der Ark, and K. Sijtsma. 2008. “Multiple imputation of incomplete categorical data using latent class analysis”. Sociological Methodology 38: 369–397. DOI: https://doi.org/10.1111/j.1467-9531.2008.00202.x.10.1111/j.1467-9531.2008.00202.x
Search in Google Scholar Back to article
WHO (World Health Organization). 2003. Community-based Strategies for Breastfeeding Promotion and Support in Developing Countries, 2003. Dept. of child and adolescent health and development. Geneva. Available at: https://www.who.int/maternal_child_adolescent/documents/9241591218/en/.
Search in Google Scholar Back to article
Wilkinson, L., and Task Force on Statistical Inference. 1999. “Statistical methods in psychology journals: Guidelines and explanations”. American Psychologist 54: 594–604. DOI: https://doi.org/10.1037/0003-066X.54.8.594.10.1037/0003-066X.54.8.594
Search in Google Scholar Back to article
Zhu, J., and T.E. Raghunathan. 2016. “Convergence Properties of a Sequential Regression Multiple Imputation Algorithm”. Journal of the American Statistical Association 110(511): 1112–1124. DOI: https://doi.org/10.1080/01621459.2014.948117.10.1080/01621459.2014.948117
Search in Google Scholar Back to article

A Hybrid Technique for the Multiple Imputation of Survey Data

References

Paradigm

My account