Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data

Wawrzeńczyk, Adam; Mielniczuk, Jan

doi:10.34768/amcs-2022-0022

Abstract

Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.

References

Bahorik, A.L., Newhill, C.E., Queen, C.C. and Eack, S.M. (2014). Under-reporting of drug use among individuals with schizophrenia: Prevalence and predictors, Psychological Medicine 44(12): 61–69, DOI: 10.1017/S0033291713000548.23551851
Open DOI Search in Google Scholar Back to article
Bekker, J. and Davis, J. (2018). Estimating the class prior in positive and unlabeled data through decision tree induction, Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, USA 32(1): 2712–2719.10.1609/aaai.v32i1.11715
Search in Google Scholar Back to article
Bekker, J. and Davis, J. (2020). Learning from positive and unlabeled data: A survey, Machine Learning 109(4): 719–760, DOI: 10.1007/s10994-020-05877-5.
Open DOI Search in Google Scholar Back to article
Bekker, J., Robberechts, P. and Davis, J. (2019). Beyond the selected completely at random assumption for learning from positive and unlabeled data, in U. Brefeld et al. (Eds), Proceedings of the 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Springer, Cham, pp. 71–85, DOI: 10.1007/978-3-030-46147-8_5.
Open DOI Search in Google Scholar Back to article
Cover, T. and Thomas, J. (1991). Elements of Information Theory, Wiley, New York, DOI: 10.1002/047174882X.
Open DOI Search in Google Scholar Back to article
Elkan, C. and Noto, K. (2008). Learning classifiers from only positive and unlabeled data, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 213–220, DOI: 10.1145/1401890.1401920.
Open DOI Search in Google Scholar Back to article
Łazęcka, M., Mielniczuk, J. and Teisseyre, P. (2021). Estimating the class prior for positive and unlabelled data via logistic regression, Advances in Data Analysis and Classification 15(4): 1039–1068, DOI: 10.1007/s11634-021-00444-9.
Open DOI Search in Google Scholar Back to article
Lipp, T. and Boyd, S. (2016). Variations and extension of the convex-concave procedure, Optimization and Engineering 17(2): 263–287, DOI: 10.1007/s11081-015-9294-x.
Open DOI Search in Google Scholar Back to article
Liu, B., Dai, Y., Li, X., Lee, W.S. and Yu, P.S. (2003). Building text classifiers using positive and unlabeled examples, Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM’03, Melbourne, USA, pp. 179–186, DOI: 10.1109/ICDM.2003.1250918.
Open DOI Search in Google Scholar Back to article
Na, B., Kim, H., Song, K., Joo, W., Kim, Y.-Y. and Moon, I.-C. (2020). Deep generative positive-unlabeled learning under selection bias, Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM’20, Ireland, pp. 1155–1164, DOI: 10.1145/3340531.3411971, (virtual event).
Open DOI Search in Google Scholar Back to article
Scott, B., Blanchard, G. and Handy, G. (2013). Classification with asymetric label noise: Consistency and maximal denoising, Proceedings of Machine Learning Research 30(2013): 1–23.
Search in Google Scholar Back to article
Sechidis, K., Sperrin, M., Petherick, E.S., Luján, M. and Brown, G. (2017). Dealing with under-reported variables: An information theoretic solution, International Journal of Approximate Reasoning 85(1): 159–177, DOI: 10.1016/j.ijar.2017.04.002.
Open DOI Search in Google Scholar Back to article
Shen, X., Diamond, S., Gu, Y. and Boyd, S. (2016). Disciplined convex-concave programming, Proceedings of 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, USA, pp. 1009–1014, DOI: 10.1109/CDC.2016.7798400.
Open DOI Search in Google Scholar Back to article
Teisseyre, P., Mielniczuk, J. and Łazęcka, M. (2020). Different strategies of fitting logistic regression for positive and unlabelled data, in V.V. Krzhizhanovskaya et al. (Eds), Proceedings of the International Conference on Computational Science ICCS’20, Springer International Publishing, Cham, pp. 3–17, DOI: 10.1007/978-3-030-50423-6_1.
Open DOI Search in Google Scholar Back to article
Ward, G., Hastie, T., Barry, S., Elith, J. and Leathwick, J. (2009). Presence-only data and the EM algorithm, Biometrics 65(2): 554–563, DOI: 10.1111/j.1541-0420.2008.01116.x.482188618759851
Open DOI Search in Google Scholar Back to article
Yang, P., Li, X., Chua, H., Kwoh, C. and Ng, S. (2014). Ensemble positive unlabeled learning for disease gene identification, PLOS ONE 9(5): 1–11, DOI: 10.1371/journal.pone.0097079.401624124816822
Open DOI Search in Google Scholar Back to article
Yuille, A. and Rangarajan, A. (2003). The concave-convex procedure, Neural Computation 15(4): 915–936, DOI: 10.1162/08997660360581958.12689392
Open DOI Search in Google Scholar Back to article

Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data

Abstract

Paradigm

My account