Have a personal or library account? Click to login
Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms Cover

Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms

Open Access
|Jun 2022

References

  1. Beck, M., F. Dumpert, and J. Feuerhake. 2018. Machine learning in official statistics. arXiv:1812.10422. DOI: https://doi.org/10.48550/arXiv.1812.10422.
  2. Braaksma, B., and C. Zeelenberg. 2015. “Re-make/Re-model: Should big data change the modelling paradigm in official statistics?” Statistical Journal of the IAOS 31(2): 193–202. DOI: htpps://doi.org/10.3233/sji-150892.10.3233/sji-150892
  3. Breiman, L. 2001. “Statistical modeling: The two cultures.” Statistical Science 16(3): 199–231. DOI: htpps://doi.org/10.1214/ss/1009213726.10.1214/ss/1009213726
  4. Bross, I.D.J. 1954. “Misclassification in 2 × 2 tables.” Biometrics 10(4): 478–486. DOI: htpps://doi.org/10.2307/3001619.10.2307/3001619
  5. Buelens, B., P.-P. de Wolf, and C. Zeelenberg. 2016. “Model based estimation at Statistics Netherlands.” In European Conference on Quality in Official Statistics, Madrid, Spain. Available at: https://www.ine.es/q2016/docs/q2016Final00196.pdf.
  6. Buonaccorsi, J.P. 2010. Measurement Error: Models, Methods, and Applications. Chapman & Hall/CRC, 31 May – 3 June, Boca Raton, Florida.10.1201/9781420066586
  7. Buskirk, T.D., and S. Kolenikov. 2015. Finding respondents in the forest: A comparison of logistic regression and random forest models for response propensity weighting and stratification. Available at: https://surveyinsights.org/?p=5108 (accessed April 2020).
  8. Costa, H, D. Almeida, F. Vala, F. Marcelino, and M. Caetano. 2018. “Land cover mapping from remotely sensed and auxiliary data for harmonized official statistics.” ISPRS International Journal of Geo-Information 7(4):157. DOI: htpps://doi.org/10.3390/ijgi7040157.10.3390/ijgi7040157
  9. Curier, R.L., T.J.A. de Jong, K. Strauch, K. Cramer, N. Rosenski, C. Schartner, M. Debusschere, H. Ziemons, D. Iren, and S. Bromuri. 2018. Monitoring spatial sustainable development: Semi-automated analysis of satellite and aerial images for energy transition and sustainability indicators. arXiv:1810.04881. DOI: https://doi.org/10.48550/arXiv.1810.04881.
  10. Daas P.J.H., and S. van der Doef. 2020. “Detecting innovative companies via their website.” Statistical Journal of the IAOS 36(4): 1239–1251. DOI: htpps://doi.org/10. 3233/SJI-200627.10.3233/SJI-200627
  11. De Broe, S.M.M.G., P. Struijs, P.J.H. Daas, A. van Delden, J. Burger, J.A. van den Brakel, K.O. ten Bosch, C. Zeelenberg, and W.F.H. Ypma. 2020. Updating the paradigm of official statistics. CBDS Working Paper 02-20, Statistics Netherlands, The Hague/Heerlen.
  12. European Commission. 2009. Regulation of European Statistics. Available at: https://eurlex.europa.eu/legal-content/EN/ALL/?uri=CELEX%3A32009R0223 (accessed April 2020).
  13. Eurostat. 2017. European Statistics Code of Practice. Available at: https://ec.europa.eu/eurostat/web/ (accessed April 2020).
  14. Forman, G. 2015. “Counting positives accurately despite inaccurate classification.” In Machine Learning: ECML 2005, Lecture Notes in Computer Science, edited by J. Gama, R. Camacho, P.B. Brazdil, A.M. Jorge, and L. Torgo: 564–575, Berlin, Heidelberg, Springer. DOI: https://oi.org/10.1007/11564096_55.10.1007/11564096_55
  15. Gama, J., I. Žliobaité, A. Bifet, M. Pechenizkiy, and A. Bouchachia. 2014. “A survey on concept drift adaptation.” ACM Computing Surveys 46(4): 1–37. DOI: htpps://doi.org/10.1145/2523813.10.1145/2523813
  16. Goldenberg, I., and G.I. Webb. 2019. “Survey of distance measures for quantifying concept drift and shift in numeric data.” Knowledge and Information Systems 60(2): 591–615. DOI: https://doi.org/10.1007/s10115-018-1257-z.10.1007/s10115-018-1257-z
  17. González, P., A. Castaño, N.V. Chawla, and J.J. Del Coz. 2017. “A review on quantification learning.” ACM Computing Surveys 50(5): 74:1–74:40. DOI: https://doi.org/10.1145/3117807.10.1145/3117807
  18. Helmbold D.P., and P.M. Long. 1994. “Tracking drifting concepts by minimizing disagreements.” Machine Learning 14(1): 27–45. DOI: https://doi.org/10.1007/BF00993161.10.1007/BF00993161
  19. Kenett, R.S., and G. Shmueli. 2016. “From quality to information quality in official statistics.” Journal of Official Statistics 32(4): 867–885. DOI: https://doi.org/10.1515/-jos-2016-0045.10.1515/jos-2016-0045
  20. Kloos, K., Q.A. Meertens, S. Scholtus, and J.D. Karch. 2020. “Comparing correction methods to reduce misclassification bias.” In BNAIC/BENELEARN 2020 edited by L. Cao, W.A. Kosters, and J. Lijffijt: 103–129, Leiden.10.1007/978-3-030-76640-5_5
  21. Kuha, J., and C.J. Skinner. 1997. “Categorical data analysis and misclassification.” In Survey Measurement and Process Quality, edited by L.E. Lyberg, P.P. Biemer, M. Collins, E.D. de Leeuw, C. Dippo, N. Schwarz, and D. Trewin: 633–670. Wiley, New York. DOI: https://doi.org/10.1002/9781118490013.10.1002/9781118490013
  22. Liu, M. 2020. “Using machine learning models to predict attrition in a survey panel.” In Big Data Meets Survey Science, edited by C.A. Hill, P.P. Biemer, T.D. Buskirk, L. Japec, A. Kirchner, S. Kolenikov, and L.E. Lyberg: 415–433. John Wiley & Sons. doi: https://doi.org\10.1002/9781118976357.ch14.
  23. Lu, J., A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang. 2019. “Learning under concept drift: A review.” IEEE Transactions on Knowledge and Data Engineering 31(12): 2346–2363. DOI: https://doi.org/10.1109/TKDE.2018.2876857.10.1109/TKDE.2018.2876857
  24. Moreno-Torres, J.G., T. Raeder, R. Alaiz-Rodríguez, N.V. Chawla, and F. Herrera. 2012. “A unifying view on dataset shift in classification.” Pattern Recognition 45(1): 521–530. DOI: https://doi.org/10.1016/j.patcog.2011.06.019.10.1016/j.patcog.2011.06.019
  25. O’Connor, B., R. Balasubramanyan, B.R. Routledge, and N.A. Smith. 2010. “From tweets to polls: Linking text sentiment to public opinion time series.” In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM) May 23 – May 26, edited by M.A. Hearst: 122–129, Washington, D.C, U.S.A. Available at: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1536/1842.10.1609/icwsm.v4i1.14031
  26. OECD. 2011. Quality Framework for OECD Statistical Activities. Available at: https://www.oecd.org/sdd/qualityframeworkforoecdstatisticalactivities.htm (accessed April 2020).
  27. Schlimmer, J.C., and R.H. Granger. 1986. “Incremental learning from noisy data.” Machine Learning 1(3): 317–354. DOI: https://doi.org/10.1007/BF00116895.10.1007/BF00116895
  28. Scholtus, S., and A. van Delden. 2020. The accuracy of estimators based on a binary classifier. Discussion Paper 202006, Statistics Netherlands, The Hague. Available at: https://www.cbs.nl/-/media/_pdf/2020/06/classification-errors-binary.pdf.
  29. Schwartz, J.E. 1985. “The neglected problem of measurement error in categorical data.” Sociological Methods & Research 13(4): 435–466. DOI: https://doi.org/10.1177/0049124185013004001.10.1177/0049124185013004001
  30. Tenenbein, A. 1970. “A double sampling scheme for estimating from binomial data with misclassifications.” Journal of the American Statistical Association 65(331): 1350–1361. DOI: https://doi.org/10.1080/01621459.1970.10481170.10.1080/01621459.1970.10481170
  31. Van Delden, A., S. Scholtus, and J. Burger. 2016. “Accuracy of Mixed-Source Statistics as Affected by Classification Errors.” Journal of Official Statistics 32(3): 619–642. DOI: https://doi.org/10.1515/jos-2016-0032.10.1515/jos-2016-0032
  32. Webb, G.I., R. Hyde, H. Cao, H.L. Nguyen, and F. Petitjean. 2016. “Characterizing concept drift.” Data Mining and Knowledge Discovery 30(4): 964–994. DOI: https://doi.org/10.1007/s10618-015-0448-4.10.1007/s10618-015-0448-4
  33. Widmer, G., and M. Kubat. 1996. “Learning in the presence of concept drift and hidden contexts.” Machine Learning 23(1): 69–101. DOI: https://doi.org/10.1023/A:1018046501280.10.1023/A:1018046501280
Language: English
Page range: 485 - 508
Submitted on: Dec 1, 2020
|
Accepted on: Jun 1, 2021
|
Published on: Jun 14, 2022
Published by: Sciendo
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2022 Q.A. Meertens, C.G.H. Diks, H.J. van den Herik, F.W. Takes, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.