Targeted Data Augmentation for Improving Model Robustness

Agnieszka Mikołajczyk-Bareła; Maria Ferlin; Michał Grochowski

doi:10.61822/amcs-2025-0011

.blurhash-client-img { display: none !important; }

Targeted Data Augmentation for Improving Model Robustness

International Journal of Applied Mathematics and Computer Science

Volume 35 (2025): Issue 1 (March 2025)

By: Agnieszka Mikołajczyk-Bareła, Maria Ferlin and Michał Grochowski

Open Access

|Apr 2025

Abstract

This paper proposes a new and effective bias mitigation method called targeted data augmentation (TDA). Since removing biases is often tedious and challenging and may not always lead to effective bias mitigation, we propose an alternative approach: skillfully inserting biases during the training to improve model robustness. To validate the proposed method, we applied TDA to two representative and diverse datasets: a clinical skin lesion dataset and a dataset of male and female faces. We identified and manually annotated existing instrument and sampling biases in these datasets, explicitly focusing on black frames and ruler marks in the skin lesion dataset and glasses in the face dataset. Using the counterfactual bias insertion (CBI) method, we confirmed that these biases strongly affect the model performance. By randomly inserting identified biases into training samples, we demonstrated that TDA significantly reduced bias measures by two times to more than 50 times, with only a negligible increase in the error rate. We performed our research on three model families: EfficientNet, DenseNet and Vision Transformer.

References

Abbas, Q., Celebi, M.E. and García, I.F. (2011). Hair removal methods: A comparative study for dermoscopy images, Biomedical Signal Processing and Control 6(4): 395–404.
Search in Google Scholar Back to article
Barata, C., Marques, J.S. and Celebi, M.E. (2019). Deep attention model for the hierarchical diagnosis of skin lesions, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, USA, pp. 2757–2765.
Search in Google Scholar Back to article
Bardou, D., Bouaziz, H., Lv, L. and Zhang, T. (2022). Hair removal in dermoscopy images using variational autoencoders, Skin Research and Technology 28(3): 445–454.
Search in Google Scholar Back to article
Bissoto, A., Fornaciali, M., Valle, E. and Avila, S. (2019). (DE)Constructing bias on skin lesion datasets, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, USA, pp. 1–9.
Search in Google Scholar Back to article
Bissoto, A., Valle, E. and Avila, S. (2020). Debiasing skin lesion datasets and models? Not so fast, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, USA, pp. 3192–3201.
Search in Google Scholar Back to article
Chai, C. and Li, G. (2020). Human-in-the-loop techniques in machine learning, IEEE Data Engineering Bulletin 43(3): 37–52.
Search in Google Scholar Back to article
Chauhan, A. (2019). Gender classification dataset, https://www.kaggle.com/datasets/cashutosh/gender-classification-dataset.
Search in Google Scholar Back to article
Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H. and Halpern, A. (2018). Skin lesion analysis toward melanoma detection: A challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC), 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, USA, pp. 168–172.
Search in Google Scholar Back to article
Combalia, M., Codella, N.C., Rotemberg, V., Helba, B., Vilaplana, V., Reiter, O., Carrera, C., Barreiro, A., Halpern, A.C. Puig, S. and Malvehy, J. (2019). BCN20000: Dermoscopic lesions in the wild, arXiv: 1908.02288.
Search in Google Scholar Back to article
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J. and Houlsby, N. (2021). An image is worth 16×16 words: Transformers for image recognition at scale, International Conference on Learning Representations, Vienna, Austria.
Search in Google Scholar Back to article
Dwork, C., Immorlica, N., Kalai, A.T. and Leiserson, M. (2018). Decoupled classifiers for group-fair and efficient machine learning, 1st Conference on Fairness, Accountability and Transparency, New York, NY, pp. 119–133.
Search in Google Scholar Back to article
Gao, D., Wu, R., Liu, J., Fan, X. and Tang, X. (2020). Finding robust transfer features for unsupervised domain adaptation, International Journal of Applied Mathematics and Computer Science 30(1): 99–112, DOI: 10.34768/amcs-2020-0008.
Search in Google Scholar Back to article
He, J. and van de Vijver, F. (2012). Bias and equivalence in cross-cultural research, Online Readings in Psychology and Culture 2(2): 2307–0919.
Search in Google Scholar Back to article
Hou, Q., Jiang, P., Wei, Y. and Cheng, M.-M. (2018). Self-erasing network for integral object attention, 32nd Conference on Advances in Neural Information Processing Systems, NeurIPS 2018.
Search in Google Scholar Back to article
Huang, G., Liu, Z. and Weinberger, K.Q. (2016). Densely connected convolutional networks, CoRR: abs/1608.06993.
Search in Google Scholar Back to article
Huang, Q., Chen, X., Metaxas, D. and Nadar, M.S. (2019). Brain segmentation from k-space with end-to-end recurrent attention network, in D. Shen et al. (Eds), Medical Image Computing and Computer-Assisted Intervention—MICCAI 2019, Springer, Cham, pp. 275–283.
Search in Google Scholar Back to article
ISIC (2020). SIIM-ISIC 2020 challenge dataset, International Skin Imaging Collaboration, https://challenge2020.isic-archive.com/.
Search in Google Scholar Back to article
Le Bras, R., Swayamdipta, S., Bhagavatula, C., Zellers, R., Peters, M., Sabharwal, A. and Choi, Y. (2020). Adversarial filters of dataset biases, Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, pp. 1078–1088.
Search in Google Scholar Back to article
Li, H., Liu, Y., Ouyang, W. and Wang, X. (2019). Zoom out-and-in network with map attention decision for region proposal and object detection, International Journal of Computer Vision 127(3): 225–238.
Search in Google Scholar Back to article
Luengo-Oroz, M., Bullock, J., Pham, K.H., Lam, C.S.N. and Luccioni, A. (2021). From artificial intelligence bias to inequality in the time of COVID-19, IEEE Technology and Society Magazine 40(1): 71–79.
Search in Google Scholar Back to article
Mahtani, K., Spencer, E.A., Brassey, J. and Heneghan, C. (2018). Catalogue of bias: Observer bias, BMJ Evidence-Based Medicine 23(1): 23–24.
Search in Google Scholar Back to article
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. and Galstyan, A. (2021). A survey on bias and fairness in machine learning, ACM Computing Surveys 54(6): 1–35, DOI: 10.1145/3457607.
Search in Google Scholar Back to article
Mikołajczyk, A., Grochowski, M. and Kwasigroch, A. (2021). Towards explainable classifiers using the counterfactual approach—Global explanations for discovering bias in data, Journal of Artificial Intelligence and Soft Computing Research 11(1): 51–67.
Search in Google Scholar Back to article
Mikołajczyk, A., Majchrowska, S. and Limeros, S.C. (2022). The (de)biasing effect of GAN-based augmentation methods on skin lesion images, arXiv: 2206.15182.
Search in Google Scholar Back to article
Oliveira, R.B., Mercedes Filho, E., Ma, Z., Papa, J.P., Pereira, A.S. and Tavares, J.M.R. (2016). Computational methods for the image segmentation of pigmented skin lesions: A review, Computer Methods and Programs in Biomedicine 131: 127–141.
Search in Google Scholar Back to article
Ramella, G. (2021). Hair removal combining saliency, shape and color, Applied Sciences 11(1): 447.
Search in Google Scholar Back to article
Shorten, C. and Khoshgoftaar, T.M. (2019). A survey on image data augmentation for deep learning, Journal of Big Data 6(1): 1–48.
Search in Google Scholar Back to article
Surówka, G. and Ogorzałek, M. (2022). Segmentation of the melanoma lesion and its border, International Journal of Applied Mathematics and Computer Science 32(4): 683–699, DOI: 10.34768/amcs-2022-0047.
Search in Google Scholar Back to article
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks, Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, pp. 6105–6114.
Search in Google Scholar Back to article
Torralba, A. and Efros, A.A. (2011). Unbiased look at dataset bias, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, USA, pp. 1521–1528.
Search in Google Scholar Back to article
Tschandl, P., Rosendahl, C. and Kittler, H. (2018). The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, Scientific Data 5: 180161.
Search in Google Scholar Back to article
Van Molle, P., De Strooper, M., Verbelen, T., Vankeirsbilck, B., Simoens, P. and Dhoedt, B. (2018). Visualizing convolutional neural networks to improve decision support for skin lesion classification, in D. Stoyanov et al. (Eds), Understanding and Interpreting Machine Learning in Medical Image Computing Applications, Springer, Cham, pp. 115–123.
Search in Google Scholar Back to article
Wang, Z., Qinami, K., Karakozis, I.C., Genova, K., Nair, P., Hata, K. and Russakovsky, O. (2020). Towards fairness in visual recognition: Effective strategies for bias mitigation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 8919–8928.
Search in Google Scholar Back to article
Wesker, K.H., Radlanski, R.J. and Kaczmarzyk, T. (2015). Face: Atlas of Clinical Anatomy, Kwintesencja, Warsaw, (in Polish).
Search in Google Scholar Back to article
Zawacki, A., Helba, B., Shih, G., Weber, J., Elliott, J., Combalia, M., Kurtansky, N., Codella, N., Culliton, P. and Rotemberg, V. (2020). SIIM-ISIC melanoma classification, https://kaggle.com/competitions/siim-isic-melanoma-classification.
Search in Google Scholar Back to article
Zhang, B. H., Lemoine, B. and Mitchell, M. (2018). Mitigating unwanted biases with adversarial learning, Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans, USA, pp. 335–340.
Search in Google Scholar Back to article
Zhao, J., Wang, T., Yatskar, M., Ordonez, V. and Chang, K.-W. (2017). Men also like shopping: Reducing gender bias amplification using corpus-level constraints, arXiv: 1707.09457.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.61822/amcs-2025-0011 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X

Journal RSS Feed

Language: English

Page range: 143 - 155

Submitted on: May 22, 2024

Accepted on: Nov 12, 2024

Published on: Apr 1, 2025

Published by: University of Zielona Góra

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

Related subjects:

© 2025 Agnieszka Mikołajczyk-Bareła, Maria Ferlin, Michał Grochowski, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Volume 35 (2025): Issue 1 (March 2025)