Have a personal or library account? Click to login
Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data Cover

Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data

Open Access
|Jul 2022

Abstract

Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.

DOI: https://doi.org/10.34768/amcs-2022-0022 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X
Language: English
Page range: 299 - 309
Submitted on: Nov 5, 2021
Accepted on: Feb 10, 2022
Published on: Jul 4, 2022
Published by: Sciendo
In partnership with: Paradigm Publishing Services
Publication frequency: 4 times per year

© 2022 Adam Wawrzeńczyk, Jan Mielniczuk, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.