Have a personal or library account? Click to login
Evaluation of speaker de-identification based on voice gender and age conversion Cover

Evaluation of speaker de-identification based on voice gender and age conversion

Open Access
|May 2018

References

  1. [1] S. Ribaric, A. Ariyaeeinia and N. Pavesic, “De-identification for privacy protection in multimedia content: A survey”, Signal Processing: Image Communication, 2016, 47, 131–151.10.1016/j.image.2016.05.020
  2. [2] A. Sayadian and F. Mozaffari, “A novel method for voice conversion based on non-parallel corpus”, International Journal of Speech Technology, 2017, 20, (3), 587–592.10.1007/s10772-017-9430-4
  3. [3] H. Valbret, E. Moulines and J. P. Tubach, “Voice transformation using PSOLA technique”, Speech Communication, 1992, 11, (2-3), 175–187.10.1016/0167-6393(92)90012-V
  4. [4] Q. Jin, A. R. Toth, T. Schultz et al, “Voice convergin: Speaker de-identification by voice transformation”, Proc. 2009 IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 3909–3912.10.1109/ICASSP.2009.4960482
  5. [5] T. Justin, V. Štruc, S. Dobrišek et al, “Speaker de-identification using diphone recognition and speech synthesis”, Proc. 11th IEEE Int. Conf. and Workshops Automatic Face and Gesture Recognition (FG 2015), Ljubljana, Slovenia, May 2015, pp. 1–7.10.1109/FG.2015.7285021
  6. [6] M. Faundez-Zanuy, E. Sesa-Nogueras and S. Marinozzi, “Speaker identification experiments under gender de-identification”, xperiments under gender de-identification. Proc. 49th Annual IEEE Int. Carnahan Conf. Security Technology ICCST 2015, Taipei, Taiwan, September 2015, pp. 309–314.10.1109/CCST.2015.7389702
  7. [7] C. Magarinos, P. Lopez-Otero, L. Docio-Fernandez et al, “Reversible speaker de-identification using pre-trained transformation functions”, Computer Speech and Language, 2017, 46, pp. 36–52.10.1016/j.csl.2017.05.001
  8. [8] M. Abou-Zleikha, Z. -H. Tan, M. G. Christensen et al, “A discriminative approach for speaker selection in speaker de-identification systems”, Proc. 23rd European Signal Processing Conf. (EUSIPCO 2015), Nice, France, August 2015, pp. 2102-2106.10.1109/EUSIPCO.2015.7362755
  9. [9] R. Vích, J. Přibil and Z. Smékal, “New cepstral zero-pole vocal tract models for TTS synthesis”, Proc. IEEE Region 8 EURO-CON’2001; vol. 2, Section S22-Speech Compression and DSP, Bratislava, Slovakia, July 2001, pp. 458–62.10.1109/EURCON.2001.938161
  10. [10] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech and Audio Processing, 1995, 3, 72–83.10.1109/89.365379
  11. [11] F. Burkhardt, A. Paeschke, M. Rolfes et al, “A database of German emotional speech”, Proc. 9th European Conf. Speech Communication and Technology (INTERSPEECH 2005), Lisbon, Portugal, September 2005, pp. 1517–1520.10.21437/Interspeech.2005-446
  12. [12] P. Klosowski, A. Dustor and J. Izydorczyk, “Speaker verification performance evaluation based on open source speech processing software and TIMIT speech corpus”, P. Gaj et al, Communications in Computer and Information Science 522 (Springer International Publishing Switzerland, 2015), pp. 400–409.10.1007/978-3-319-19419-6_38
  13. [13] M. Fleischer, S. Pinkert, W. Mattheus et al, “Formant frequencies and bandwidths of the vocal tract transfer function are affected by the mechanical impedance of the vocal tract wall”, Biomechanics and Modeling in Mechanobiology, 2015, 14, (4), 719–733.10.1007/s10237-014-0632-2449017825416844
  14. [14] M. P. Gelfer and Q. E. Bennett, “Speaking fundamental frequency and vowel formant frequencies: Effects on perception of gender”, Journal of Voice, 2013, 27, (5), 556–566.10.1016/j.jvoice.2012.11.00823415148
  15. [15] K. Pisanski, B. C. Jones, B. Fink et al. “Voice parameters predict sex-specific body morphology in men and women”, Animal Behaviour, 2016, 112, 13–32.10.1016/j.anbehav.2015.11.008
  16. [16] U. Reubold, J. Harrington and F. Kleber, “Vocal aging effects on F0 and the first formant: A longitudinal analysis in adult speakers”, Speech Communication, 2010, 52, (7-8), 638–651.10.1016/j.specom.2010.02.012
  17. [17] C. M. Bishop, “Pattern Recognition and Machine Learning”, Springer,.
  18. [18] G. Muhammad and K. Alghathbar, “Environment recognition for digital audio forensics using MPEG-7 and mel cepstral features”, Journal of Electrical Engineering, 2011, 62, (4), 199–205.10.2478/v10187-011-0032-0
  19. [19] J. Přibil and A. Přibilová, “GMM-based evaluation of emotional style transformation in Czech and Slovak”, Cognitive Computation, 2014, 6, (4), 928–939.10.1007/s12559-014-9283-y
  20. [20] J. Přibil and A. Přibilová, “Comparison of text-independent original speaker recognition from emotionally converted speech”, A. Esposito et al, Smart Innovation, Systems and Technologies 2016, 48, pp. 137–149.10.1007/978-3-319-28109-4_14
  21. [21] J. Přibil an d A. Přibilová, J. Matoušek, “GMM-based speaker age and gender classification in Czech and Slovak”, Journal of Electrical Engineering, 2017, 68, (1), 3–12.10.1515/jee-2017-0001
  22. [22] B. Božilovic, B. M. Todorovic and M. Obradovic, “Text independent speaker recognition using two-dimensional information entropy”, Journal of Electrical Engineering, 2015, 66, (3), 169–173.10.2478/jee-2015-0027
  23. [23] I. T. Nabney, “Netlab Pattern Analysis Toolbox, Release 3”, http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/downloads, accessed 2 October 2015.
DOI: https://doi.org/10.2478/jee-2018-0017 | Journal eISSN: 1339-309X | Journal ISSN: 1335-3632
Language: English
Page range: 138 - 147
Submitted on: Nov 14, 2017
|
Published on: May 30, 2018
In partnership with: Paradigm Publishing Services
Publication frequency: 6 issues per year

© 2018 Jiří Přibil, Anna Přibilová, Jindřich Matoušek, published by Slovak University of Technology in Bratislava
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.