Have a personal or library account? Click to login
A Hybrid Technique for the Multiple Imputation of Survey Data Cover

A Hybrid Technique for the Multiple Imputation of Survey Data

Open Access
|Jun 2021

References

  1. Arnold, B.C., and S.J. Press. 1989. “Compatible Conditional Distributions”. Journal of the American Statistical Association 84:152–156. DOI: https://doi.org/10.2307/2289858.10.2307/2289858
  2. Allison P.D. 2002. Missing Data. Thousand Oaks. CA: Sage Publications. DOI: https://dx.doi.org/10.4135/9781412985079.10.4135/9781412985079
  3. Abdella, M., and T. Marwala, 2005. “The use of genetic algorithms and neural networks to approximate missing data in database”. In Proceedings of the IEEE 3rd International Conference on Computational Cybernetics, 2005. 24: 207–212. DOI: DOI: https://doi.org/10.1109/ICCCYB.2005.1511574.10.1109/ICCCYB.2005.1511574
  4. Ankaiah, N., and V.Ravi. 2011. “A novel soft computing hybrid for data imputation”. In Proceedings of the 7th International Conference on Data Mining (DMIN). Las Vegas. USA. Available at: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.217.7984&rep=rep1&type=pdf.
  5. Akande, O., F. Li, and J. Reiter. 2017. “An empirical comparison of multiple imputation methods for categorical data”. The American Statistician 71: 162–170. DOI: https://doi.org/10.1080/00031305.2016.1277158.10.1080/00031305.2016.1277158
  6. Andridge, R.R., and R.J.A. Little. 2017. “A Review of Hot Deck Imputation for Survey Non-response”. International statistical review 78(1): 40–64. DOI: https://doi.org/10.1111/j.1751-5823.2010.00103.x.10.1111/j.1751-5823.2010.00103.x313033821743766
  7. Armina, R., A.M. Zain, N.A. Ali, and R. Sallehuddin, 2017. “A review on missing value estimation using imputation algorithm”. Journal of Physics: Conference Series 892(1). DOI: https://doi.org/10.1088/1742-6596/892/1/012004.10.1088/1742-6596/892/1/012004
  8. Bengio, Y., and F. Gingras. 1995. “Recurrent neural networks for missing or asynchronous data. In Touretzky, D.S., Mozer, M.C. and Hasselmo, M.E. editors”. Advances in Neural Information Processing Systems 8: 95–401. MIT Press, Cambridge, MA. Available at: https://proceedings.neurips.cc/paper/1995/file/ffeed84c7cb1ae7bf4ec4bd78275bb98-Paper.pdf.
  9. Barnard, J., and X. Meng. 1999. “Applications of multiple imputation in medical studies: From AIDS to NHANES”. Statistical Methods in Medical Research 8:17–36. DOI: https://doi.org/10.1177/096228029900800103.10.1177/09622802990080010310347858
  10. Breiman, L. 2001. “Random Forests”. Machine Learning 45(1): 5–32. DOI: https://doi.org/10.1023/A:1010933404324.10.1023/A:1010933404324
  11. Batista, G., and M.C. Monard. 2003. Experimental comparison of K-nearest neighbour and mean or mode imputation methods with the internal strategies used by C4.5 and CN2 to treat missing data. University of Sao Paulo. Available at: https://www.semanticscholar.org/paper/Experimental-comparison-pf-K-NEAREST-NEIGHBOUR-and-BatistaMonard/35346d559d1bcfdf27acff66267e8f1d67190f23.
  12. Burton, A., D. G. Altman, P. Royston, and R.L. Holder. 2006. “The design of simulation studies in medical statistics”. Statistics in Medicine 25: 4279–4292. DOI: https://doi.org/10.1002/sim.2673.10.1002/sim.267316947139
  13. Chung, D., and F.L. Merat. 1996. Neural network based sensor array signal processing. In: Proc Int Conf Multisens Fusion Integr Intell Syst. Washington. USA: 757–764. DOI: https://doi.org/10.1109/MFI.1996.572313.10.1109/MFI.1996.572313
  14. Chandra, A., G.M. Martinez, W.D. Mosher, J.C. Abma, and J. Jones. 2005. “Fertility, family planning, and reproductive health of U.S. women: data from the 2002 National Survey of Family Growth”. Vital Health Stat 23: 1–160. Available at: https://pubmed.ncbi.nlm.nih.gov/16532609/10.1037/e414702008-001
  15. Corsi, D.J., J.M. Perkins, and S.V. Subramanian. 2017. “Child anthropometry data quality from Demographic and Health Surveys, Multiple Indicator Cluster Surveys, and National Nutrition Surveys in the West Central Africa region: are we comparing apples and oranges?”. Global Health Action. DOI: https://doi.org/10.1080/16549716.2017.1328185.10.1080/16549716.2017.1328185549606328641057
  16. Dunson, D.B., and C. Xing. 2009. “Nonparametric Bayes modeling of multivariate categorical data”. Journal of the American Statistical Association 104: 1042–1051. DOI: https://doi.org/10.1198/jasa.2009.tm08439.10.1198/jasa.2009.tm08439363037823606777
  17. Gelman, A., and T.P. Speed. 1993. “Characterizing a joint probability distribution by conditionals”. Journal of the Royal Statistical Society Series B: Statistical Methodology 55: 85–188. DOI: https://doi.org/10.1111/j.2517-6161.1993.tb01477.x.10.1111/j.2517-6161.1993.tb01477.x
  18. Graham, J.W., and J.L. Schafer. 1999. “On the performance of multiple imputation for multivariate data with small sample size. In R. Hoyle (Ed.)”. Statistical strategies for small sample research: 1–29.
  19. Gulliford, M.C., O.C. Ukoumunne, and, S. Chinn. 1999. “Components of Variance and Intra class Correlations for the Design of Community-based Surveys and Intervention Studies: Data from the Health Survey for England”. American Journal of Epidemiology 149(9): 876–883. DOI: https://doi.org/10.1.1.565.7897.10.1093/oxfordjournals.aje.a00990410221325
  20. Harel, O., and X.H. Zhou. 2007. “Multiple imputation: Review of theory, implementation and Software”. Statistics in Medicine 26: 3057–3077. DOI: https://doi.org/10.1002/-sim.2787.
  21. Horton, N.J., and K.P. Kleinman. 2007. “Much ado about nothing: a comparison of missing data methods and software to fit incomplete regression models”. The American Statistician 61: 79–90. DOI: https://doi.org/10.1198/000313007X172556.10.1198/000313007X172556183999317401454
  22. Honaker, J., G. King, and M. Blackwell. 2011. “Amelia II: A program for missing data”. Journal of Statistical Software 45(7): 1–47. DOI: https://doi.org/10.18637/jss.v045.i07.10.18637/jss.v045.i07
  23. Hardt, J., M. Herke, and R. Leonhart. 2012. “Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research”. BMC Medical Research Methodology 12(1). DOI: https://doi.org/10.1186/1471-2288-12-184.10.1186/1471-2288-12-184353866623216665
  24. Kohonen, T. 1995. Self-Organizing Maps. Springer. Heidelberg. Available at: https://www.springer.com/gp/book/9783642976100.10.1007/978-3-642-97610-0
  25. Lazarsfeld, P.F. 1950. The logical and mathematical foundation of latent structure analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen, Studies in social psychology in World War II: Vol. 4. Measurement and prediction.Chap. 10: 362–412. Princeton, NJ: Princeton University Press. Available at: https://psycnet.apa.org/record/1951-03037-000.
  26. Li, F., Y. Yu, and D.B. Rubin. 2012. Imputing missing data by fully conditional models: some cautionary examples and guidelines. Duke University Department of Statistical Science Discussion Paper: 11–24. Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.7010.
  27. Little, R.J.A. 1988. “A Test of Missing Completely at Random for Multivariate Data with Missing Values”. Journal of the American Statistical Association 83(404): 1198–1202. DOI: https://doi.org/10.1080/01621459.1988.10478722.10.1080/01621459.1988.10478722
  28. Little, R.J. 2018. “On Algorithmic and Modeling Approaches to Imputation in Large Data Sets”. Statistica Sinica. http://www3.stat.sinica.edu.tw/statistica/J30N4/J30N401/J30N401.html
  29. Little, R.J.A., and D.B. Rubin. 2002. Statistical analysis with missing data (2nd edition.). New York: Wiley. Available at: https://www.wiley.com/en-us/Statistical+Analysis+with+Missing+Data%2C+2nd+Edition-p-9781119013563.10.1002/9781119013563
  30. McLachlan, G.J., and D. Peel. 2000. Finite mixture models. New York: Wiley. DOI: http://dx.doi.org/10.1002/0471721182.10.1002/0471721182
  31. Marseguerra, M., and A. Zoia. 2005. “The autoassociative neural network in signal analysis. II. Application to on-line monitoring of a simulated BWR component”. Annals of Nuclear Energy 32(11): 1207–1223. DOI: https://doi.org/10.1016/j.anucene.2005.03.005.10.1016/j.anucene.2005.03.005
  32. Marwala, T., and S. Chakraverty. 2006. “Fault classification in structures with incomplete measured data using auto associative neural networks and genetic algorithm”. Current Science India 90(4): 542-548. JSTOR. Available at: www.jstor.org/stable/24088946.
  33. Morris, T.P., R.W. Ian, and R. Patrick. 2014. “Tuning Multiple Imputation by Predictive Mean Matching and Local Residual Draws. BMC Medical Research Methodology 14 (1): 75. DOI: https://doi.org/10.1186/1471-2288-14-75.10.1186/1471-2288-14-75405196424903709
  34. Murray, J.S., and J.P. Reiter. 2016. “Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence”. Journal of the American Statistical Association 111: 1466–1479. DOI: https://doi.org/10.1080/01621459.2016.1174132.10.1080/01621459.2016.1174132
  35. Narayanan, S., J.L.Vian, J. Choi, M. El-Sharkawi, and B.B.Thompson. 2002. Set constraint discovery: missing sensor data restoration using auto-associative regression machines. In Proceedings of the international Joint Conference on Neural Networks (IJCNN): 2872–2877. DOI: https://doi.org/10.1109/IJCNN.2002.1007604.10.1109/IJCNN.2002.1007604
  36. Oja, E., and S. Kaski. 1999. Kohonen Maps. Elsevier. Amsterdam. Available at: https://www.elsevier.com/books/kohonen-maps/oja/978-0-444-50270-4.
  37. Oba, S., M. Sato, I. Takemasa, M. Monden, K. Matsubara, and S. Ishii. 2003. “A Bayesian missing value estimation method for gene expression profile data”. Bioinformatics 19: 2088–2096. DOI: https://doi.org/10.1093/bioinformatics/btg287.10.1093/bioinformatics/btg28714594714
  38. Pyle, D. 1999. Data preparation for data mining. Morgan Kaufmann Publishers Inc. San Francisco. Available at: https://dl.acm.org/doi/book/10.5555/299577.
  39. Pérez, A., R.J. Dennis, J.F. Gil, M.A. Rondón, and A. López. 2002. “Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia”. Statistics in Medicine 21: 3885–3896. DOI: https://doi.org/10.1002/sim.1391.10.1002/sim.139112483773
  40. Quanli, W., M.V. Danial, J.P. Reiter, and H. Jigchen. 2018. NPBayesImputeCat: Non-Parametric Bayesian Multiple Imputation for Categorical Data. R package version 0.1, Available at: https://CRAN.R-project.org/package=NPBayesImputeCat.
  41. Rubin, D.B. 1976. “Inference and Missing Data”. Biometrika 63: 581–590. DOI: https://doi.org/10.2307/2335739.10.2307/2335739
  42. Rubin, D.B. 1987. Multiple Imputation for Nonresponse in Surveys. Wiley, New York. Available at: https://www.wiley.com/en-us/Multiple+Imputation+for+Nonresponse+in+Surveys-p-9780471655749.
  43. Roth, P.L. 1994. “Missing data: A conceptual review for applied psychologysts”. Personnel Psychology 47: 537–560. DOI: https://doi.org/10.1111/j.1744-6570.1994.tb01736.x.10.1111/j.1744-6570.1994.tb01736.x
  44. Rubin, D.B. 1996. “Multiple imputation after 18 + years”. Journal of the American Statistical Association 91: 473–489. DOI: https://doi.org/10.1080/01621459.1996.10476908.10.1080/01621459.1996.10476908
  45. Raghunathan, T.W., J.M. Lepkowksi, J. van Hoewyk, and P.A. Solenbeger. 2001. “Multivariate technique for multiply imputing missing values using a sequence of regression models”. Survey Methodology 27: 85–95. Available at: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.405.4540.
  46. Reiter, J.P., T.E. Raghunathan, and S. Kinney. 2006. “The importance of modeling the survey design in multiple imputation for missing data”. Survey Methodology 32: 143–149. Available at: http://www2.stat.duke.edu/~jerry/Papers/SM06.pdf.
  47. Royston, P., and I.R. White. 2011. “Multiple imputation by chained equations (mice): Implementation in Stata”. Journal of Statistical Software 45(4): 1–20. DOI: https://doi.org/10.18637/jss.v045.i04.10.18637/jss.v045.i04
  48. R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org.
  49. Sharpe, P.K., and R.J. Solly. 1995. “Dealing with missing values in neural network-based diagnostic systems”. Neural Computing and Applications 3(2): 73–77. DOI: https://doi.org/10.1007/BF01421959.10.1007/BF01421959
  50. Schafer, J.L. 1997. Analysis of incomplete multivariate data. London: Chapman and Hall. DOI: https://doi.org/10.1201/9780367803025
  51. Schafer, J.L. and J.W. Graham. 2002. “Missing data: Our view of the state of the art”. Psychological methods 7: 147–177. DOI: https://doi.org/10.1037/1082-989X.7.2.147.10.1037/1082-989X.7.2.147
  52. Schlomer, G.L., S. Bauman, and N.A. Card. 2010. “Best Practices for Missing Data Management in Counseling Psychology”. Journal of Counseling Psychology 57(1): 1–10. DOI: https://doi.org/10.1037/a0018082.10.1037/a001808221133556
  53. Si, Y., and J.P. Reiter. 2013. “Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys”. Journal of Educational and Behavioral Statistics 38: 499–521. DOI: https://doi.org/10.3102/1076998613480394.10.3102/1076998613480394
  54. Templ, M., A. Andreas, K. Alexander, and P. Bernd. 2012. VIM: Visualization and Imputation of Missing Values. Available at: http://cran.r-project.org/web/packages/VIM/VIM.pdf.
  55. Van Buuren, S. 2007. “Multiple imputation of discrete and continuous data by fully conditional specification”. Statistical Methods in Medical Research 16: 219–242. DOI: https://doi.org/10.1177/0962280206074463.10.1177/096228020607446317621469
  56. Van Buuren, S. 2012. Flexible Imputation of Missing Data, London: Chapman and Hall/CRC. DOI: https://doi.org/10.1201/b11826.10.1201/b11826
  57. Van Buuren, S., and K. Groothuis-Oudshoorn. 1999. Flexible multivariate imputation by MICE. TNO Prevention and Health. Leiden. Available at: https://stefvanbuuren.name/publications/Flexible%20multivariate%20-%20TNO99054%201999.pdf.
  58. Van Buuren, S., and K. Groothuis-Oudshoorn. 2011. “mice: Multivariate imputation by chained equations”. R. Journal of Statistical Software 45(3): 1–67. DOI: https://doi.org/10.18637/jss.v045.i03.10.18637/jss.v045.i03
  59. Van Ginkel, J.R. 2007. Multiple imputation for incomplete test, questionnaire and survey data. Ph.D. dissertation. Tilburg University. Department of Methodology and Statistics. Available at: https://pure.uvt.nl/ws/portalfiles/portal/839209/224433.pdf.
  60. Vermunt, J.K., J.R. van Ginkel, L.A. van der Ark, and K. Sijtsma. 2008. “Multiple imputation of incomplete categorical data using latent class analysis”. Sociological Methodology 38: 369–397. DOI: https://doi.org/10.1111/j.1467-9531.2008.00202.x.10.1111/j.1467-9531.2008.00202.x
  61. WHO (World Health Organization). 2003. Community-based Strategies for Breastfeeding Promotion and Support in Developing Countries, 2003. Dept. of child and adolescent health and development. Geneva. Available at: https://www.who.int/maternal_child_adolescent/documents/9241591218/en/.
  62. Wilkinson, L., and Task Force on Statistical Inference. 1999. “Statistical methods in psychology journals: Guidelines and explanations”. American Psychologist 54: 594–604. DOI: https://doi.org/10.1037/0003-066X.54.8.594.10.1037/0003-066X.54.8.594
  63. Zhu, J., and T.E. Raghunathan. 2016. “Convergence Properties of a Sequential Regression Multiple Imputation Algorithm”. Journal of the American Statistical Association 110(511): 1112–1124. DOI: https://doi.org/10.1080/01621459.2014.948117.10.1080/01621459.2014.948117
Language: English
Page range: 505 - 531
Submitted on: Mar 1, 2019
|
Accepted on: Dec 1, 2020
|
Published on: Jun 22, 2021
Published by: Sciendo
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2021 Humera Razzak, Christian Heumann, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.