Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms

Q.A. Meertens; C.G.H. Diks; H.J. van den Herik; F.W. Takes

doi:10.2478/jos-2022-0023

References

Beck, M., F. Dumpert, and J. Feuerhake. 2018. Machine learning in official statistics. arXiv:1812.10422. DOI: https://doi.org/10.48550/arXiv.1812.10422.
Search in Google Scholar Back to article
Braaksma, B., and C. Zeelenberg. 2015. “Re-make/Re-model: Should big data change the modelling paradigm in official statistics?” Statistical Journal of the IAOS 31(2): 193–202. DOI: htpps://doi.org/10.3233/sji-150892.10.3233/sji-150892
Search in Google Scholar Back to article
Breiman, L. 2001. “Statistical modeling: The two cultures.” Statistical Science 16(3): 199–231. DOI: htpps://doi.org/10.1214/ss/1009213726.10.1214/ss/1009213726
Search in Google Scholar Back to article
Bross, I.D.J. 1954. “Misclassification in 2 × 2 tables.” Biometrics 10(4): 478–486. DOI: htpps://doi.org/10.2307/3001619.10.2307/3001619
Search in Google Scholar Back to article
Buelens, B., P.-P. de Wolf, and C. Zeelenberg. 2016. “Model based estimation at Statistics Netherlands.” In European Conference on Quality in Official Statistics, Madrid, Spain. Available at: https://www.ine.es/q2016/docs/q2016Final00196.pdf.
Search in Google Scholar Back to article
Buonaccorsi, J.P. 2010. Measurement Error: Models, Methods, and Applications. Chapman & Hall/CRC, 31 May – 3 June, Boca Raton, Florida.10.1201/9781420066586
Search in Google Scholar Back to article
Buskirk, T.D., and S. Kolenikov. 2015. Finding respondents in the forest: A comparison of logistic regression and random forest models for response propensity weighting and stratification. Available at: https://surveyinsights.org/?p=5108 (accessed April 2020).
Search in Google Scholar Back to article
Costa, H, D. Almeida, F. Vala, F. Marcelino, and M. Caetano. 2018. “Land cover mapping from remotely sensed and auxiliary data for harmonized official statistics.” ISPRS International Journal of Geo-Information 7(4):157. DOI: htpps://doi.org/10.3390/ijgi7040157.10.3390/ijgi7040157
Search in Google Scholar Back to article
Curier, R.L., T.J.A. de Jong, K. Strauch, K. Cramer, N. Rosenski, C. Schartner, M. Debusschere, H. Ziemons, D. Iren, and S. Bromuri. 2018. Monitoring spatial sustainable development: Semi-automated analysis of satellite and aerial images for energy transition and sustainability indicators. arXiv:1810.04881. DOI: https://doi.org/10.48550/arXiv.1810.04881.
Search in Google Scholar Back to article
Daas P.J.H., and S. van der Doef. 2020. “Detecting innovative companies via their website.” Statistical Journal of the IAOS 36(4): 1239–1251. DOI: htpps://doi.org/10. 3233/SJI-200627.10.3233/SJI-200627
Search in Google Scholar Back to article
De Broe, S.M.M.G., P. Struijs, P.J.H. Daas, A. van Delden, J. Burger, J.A. van den Brakel, K.O. ten Bosch, C. Zeelenberg, and W.F.H. Ypma. 2020. Updating the paradigm of official statistics. CBDS Working Paper 02-20, Statistics Netherlands, The Hague/Heerlen.
Search in Google Scholar Back to article
European Commission. 2009. Regulation of European Statistics. Available at: https://eurlex.europa.eu/legal-content/EN/ALL/?uri=CELEX%3A32009R0223 (accessed April 2020).
Search in Google Scholar Back to article
Eurostat. 2017. European Statistics Code of Practice. Available at: https://ec.europa.eu/eurostat/web/ (accessed April 2020).
Search in Google Scholar Back to article
Forman, G. 2015. “Counting positives accurately despite inaccurate classification.” In Machine Learning: ECML 2005, Lecture Notes in Computer Science, edited by J. Gama, R. Camacho, P.B. Brazdil, A.M. Jorge, and L. Torgo: 564–575, Berlin, Heidelberg, Springer. DOI: https://oi.org/10.1007/11564096_55.10.1007/11564096_55
Search in Google Scholar Back to article
Gama, J., I. Žliobaité, A. Bifet, M. Pechenizkiy, and A. Bouchachia. 2014. “A survey on concept drift adaptation.” ACM Computing Surveys 46(4): 1–37. DOI: htpps://doi.org/10.1145/2523813.10.1145/2523813
Search in Google Scholar Back to article
Goldenberg, I., and G.I. Webb. 2019. “Survey of distance measures for quantifying concept drift and shift in numeric data.” Knowledge and Information Systems 60(2): 591–615. DOI: https://doi.org/10.1007/s10115-018-1257-z.10.1007/s10115-018-1257-z
Search in Google Scholar Back to article
González, P., A. Castaño, N.V. Chawla, and J.J. Del Coz. 2017. “A review on quantification learning.” ACM Computing Surveys 50(5): 74:1–74:40. DOI: https://doi.org/10.1145/3117807.10.1145/3117807
Search in Google Scholar Back to article
Helmbold D.P., and P.M. Long. 1994. “Tracking drifting concepts by minimizing disagreements.” Machine Learning 14(1): 27–45. DOI: https://doi.org/10.1007/BF00993161.10.1007/BF00993161
Search in Google Scholar Back to article
Kenett, R.S., and G. Shmueli. 2016. “From quality to information quality in official statistics.” Journal of Official Statistics 32(4): 867–885. DOI: https://doi.org/10.1515/-jos-2016-0045.10.1515/jos-2016-0045
Search in Google Scholar Back to article
Kloos, K., Q.A. Meertens, S. Scholtus, and J.D. Karch. 2020. “Comparing correction methods to reduce misclassification bias.” In BNAIC/BENELEARN 2020 edited by L. Cao, W.A. Kosters, and J. Lijffijt: 103–129, Leiden.10.1007/978-3-030-76640-5_5
Search in Google Scholar Back to article
Kuha, J., and C.J. Skinner. 1997. “Categorical data analysis and misclassification.” In Survey Measurement and Process Quality, edited by L.E. Lyberg, P.P. Biemer, M. Collins, E.D. de Leeuw, C. Dippo, N. Schwarz, and D. Trewin: 633–670. Wiley, New York. DOI: https://doi.org/10.1002/9781118490013.10.1002/9781118490013
Search in Google Scholar Back to article
Liu, M. 2020. “Using machine learning models to predict attrition in a survey panel.” In Big Data Meets Survey Science, edited by C.A. Hill, P.P. Biemer, T.D. Buskirk, L. Japec, A. Kirchner, S. Kolenikov, and L.E. Lyberg: 415–433. John Wiley & Sons. doi: https://doi.org\10.1002/9781118976357.ch14.
Search in Google Scholar Back to article
Lu, J., A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang. 2019. “Learning under concept drift: A review.” IEEE Transactions on Knowledge and Data Engineering 31(12): 2346–2363. DOI: https://doi.org/10.1109/TKDE.2018.2876857.10.1109/TKDE.2018.2876857
Search in Google Scholar Back to article
Moreno-Torres, J.G., T. Raeder, R. Alaiz-Rodríguez, N.V. Chawla, and F. Herrera. 2012. “A unifying view on dataset shift in classification.” Pattern Recognition 45(1): 521–530. DOI: https://doi.org/10.1016/j.patcog.2011.06.019.10.1016/j.patcog.2011.06.019
Search in Google Scholar Back to article
O’Connor, B., R. Balasubramanyan, B.R. Routledge, and N.A. Smith. 2010. “From tweets to polls: Linking text sentiment to public opinion time series.” In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM) May 23 – May 26, edited by M.A. Hearst: 122–129, Washington, D.C, U.S.A. Available at: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1536/1842.10.1609/icwsm.v4i1.14031
Search in Google Scholar Back to article
OECD. 2011. Quality Framework for OECD Statistical Activities. Available at: https://www.oecd.org/sdd/qualityframeworkforoecdstatisticalactivities.htm (accessed April 2020).
Search in Google Scholar Back to article
Schlimmer, J.C., and R.H. Granger. 1986. “Incremental learning from noisy data.” Machine Learning 1(3): 317–354. DOI: https://doi.org/10.1007/BF00116895.10.1007/BF00116895
Search in Google Scholar Back to article
Scholtus, S., and A. van Delden. 2020. The accuracy of estimators based on a binary classifier. Discussion Paper 202006, Statistics Netherlands, The Hague. Available at: https://www.cbs.nl/-/media/_pdf/2020/06/classification-errors-binary.pdf.
Search in Google Scholar Back to article
Schwartz, J.E. 1985. “The neglected problem of measurement error in categorical data.” Sociological Methods & Research 13(4): 435–466. DOI: https://doi.org/10.1177/0049124185013004001.10.1177/0049124185013004001
Search in Google Scholar Back to article
Tenenbein, A. 1970. “A double sampling scheme for estimating from binomial data with misclassifications.” Journal of the American Statistical Association 65(331): 1350–1361. DOI: https://doi.org/10.1080/01621459.1970.10481170.10.1080/01621459.1970.10481170
Search in Google Scholar Back to article
Van Delden, A., S. Scholtus, and J. Burger. 2016. “Accuracy of Mixed-Source Statistics as Affected by Classification Errors.” Journal of Official Statistics 32(3): 619–642. DOI: https://doi.org/10.1515/jos-2016-0032.10.1515/jos-2016-0032
Search in Google Scholar Back to article
Webb, G.I., R. Hyde, H. Cao, H.L. Nguyen, and F. Petitjean. 2016. “Characterizing concept drift.” Data Mining and Knowledge Discovery 30(4): 964–994. DOI: https://doi.org/10.1007/s10618-015-0448-4.10.1007/s10618-015-0448-4
Search in Google Scholar Back to article
Widmer, G., and M. Kubat. 1996. “Learning in the presence of concept drift and hidden contexts.” Machine Learning 23(1): 69–101. DOI: https://doi.org/10.1023/A:1018046501280.10.1023/A:1018046501280
Search in Google Scholar Back to article

Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms

References

Paradigm

My account