Have a personal or library account? Click to login
Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations Cover

Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations

Open Access
|May 2020

References

  1. [1] A. Zelenik and Z. Kacic, “Multi-Resolution Feature Extraction Algorithm in Emotional Speech Recognition”, Elektronika ir Elektrotechnika, vol. 21, no. 5, pp. 54–58, 2015, DOI: 10.5755/j01.eee.21.5.13328.10.5755/j01.eee.21.5.13328
  2. [2] M. Grůber and J. Matoušek, “Listening-Test-Based Annotation of Communicative Functions for Expressive Speech Synthesis”, P. Sojka, A. Horak, I. Kopecek, K. Pala (eds.): Text, Speech, and Dialogue (TSD) 2010, LNCS, vol. 6231, pp. 283–290, Springer 2010.
  3. [3] P. C. Loizou, “Speech Quality Assessment”, W. Tao, et al.(eds): Multimedia Analysis, Processing and Communications. Studies Computational Intelligence, vol. 346, pp. 623–654, Springer, Berlin, Heidelberg, 2011, DOI:10.1007/978-3-642-19551-8_23.10.1007/978-3-642-19551-8_23
  4. [4] H. Ye and S. Young, “High Quality Voice Morphing”, ICASSP 2004 Proceedings. IEEE International Conference on Acoustics, Speech, and Signal Processing, 17-21 May 2004, Montreal, Canada, DOI:10.1109/ICASSP.2004.1325909.10.1109/ICASSP.2004.1325909
  5. [5] M. Adiban, B. BabaAli and S. Shehnepoor, “Statistical Feature Embedding for Heart Sound Classification”, Journal of Electrical Engineering, vol. 70, no. 4, pp. 259–272, 2019, DOI: 10.2478/jee-2019-0056.10.2478/jee-2019-0056
  6. [6] B. Boilović, B. M. Todorović and M. Obradović, “Text-Independent Speaker Recognition using Two-Dimensional Information Entropy”, Journal of Electrical Engineering, vol. 66, no. 3, pp. 169–173, 2015, DOI: 10.1515/jee-2015-0027.
  7. [7] C. Y. Lee and Z. J. Lee, “A Novel Algorithm Applied to Classify Unbalanced Data”, Applied Soft Computing, vol. 12, pp. 2481–2485, 2012, DOI: 10.1016/j.asoc.2012.03.051.10.1016/j.asoc.2012.03.051
  8. [8] R. Vích, J. Nouza and M. Vondra, “Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems”, A. Esposito et al. (eds.): Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction, LNCS, vol. 5042, pp. 136–148, Springer 2008.
  9. [9] M. Cerňak, M. Rusko and M. Trnka, “Diagnostic Evaluation of Synthetic Speech using Speech Recognition”, Procs. of the 16th International Congress on Sound and Vibration (ICSV16), Kraków, Poland, 5-9 July, p. 6, 2009, https://pdfs.semanticscholar.org/502b/f1d8bfb0cc90cd3defcc9d479d9a97b23b66.pdf.
  10. [10] S. Möller, and J. Heimansberg, “Estimation of TTS Quality Telephone Environments Using a Reference-free Quality Prediction Model”, Second ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, Germany, September 2006, pp. 56–60, ISCA Archive, http://www.isca-speech.org/archive_open/pqs2006.
  11. [11] D.-Y. Huang, “Prediction of Perceived Sound Quality of Synthetic Speech”, Procs. of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2011 Xi’an, China, October 18-21, 2011, p. 6, http://www.apsipa.org/proceedings2011/pdf/APSIPA100.pdf.
  12. [12] S. Möller et al, “Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems”, 2010, INTERSPEECH- 2010, pp. 1325–1328, https://www.isca-speech.org/archive/archive_papers/interspeech_2010/i10_1325.pdf.10.21437/Interspeech.2010-413
  13. [13] F. Hinterleitner et al, “Predicting the Quality of Synthesized Speech using Reference-Based Prediction Measures”, Studientexte zur Sprachkommunikation: Elektronische Sprachsignalver-arbeitung, Session: Sprachsynthese-Evaluation und Prosodie, 2011, pp. 99–106, TUDpress, Dresden, http://www.essv.de/paper.php?id=14.
  14. [14] J. P. H. van Santen, “Segmental Duration and Speech Timing”, Y. Sagisaka, N.Campbell, N.Higuchi (eds.): Computing Prosody, Springer, New York, NY, pp. 225–248, 1997.10.1007/978-1-4612-2258-3_15
  15. [15] C. M. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006.
  16. [16] V. Rodellar-Biarge, D. Palacios-Alonso, V. Nieto-Lluis, and P. Gomez-Vilda, “Towards the search of detection speech-relevant features for stress”, Expert Systems, vol. 32, no.6, pp. 710-718, 2015.DOI: 10.1111/exsy.12109.10.1111/exsy.12109
  17. [17] A. J. Hunt and A. W. Black, “Unit Selection a Concatenative Speech Synthesis System using a Large Speech Database”, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Atlanta (Georgia, USA), pp. 373–376, 1996, DOI: 10.1109/ICASSP.1996.541110.10.1109/ICASSP.1996.541110
  18. [18] J. Kala and J. Matoušek, “Very Fast Unit Selection using Viterbi Search with Zero-Concatenation-Cost Chains”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), Florence, Italy, pp. 2569–2573, 2014.
  19. [19] M. Jůzová, D. Tihelka and R. Skarnitzl, “Last Syllable Unit Penalization Unit Selection TTS”, K. Ekstein and V. Matousek (eds.): Text, Speech, and Dialogue (TSD 2017), LNAI vol. 10415, pp. 317–325, 2017, DOI: 10.1007/978-3-319-64206-2 36.10.1007/978-3-319-64206-2
  20. [20] D. Tihelka, Z. Hanzlíček, M. Jůzová, J. Vít, J. Matoušek and M. Grůber, “Current State of Text-to-Speech System ARTIC: A Decade of Research on the Field of Speech Technologies”, P. Sojka, A.Horák, I.Kopeček, and K. Pala (eds): Text, Speech, and Dialogue (TSD 2018), LNAI 11107, pp. 369–378, 2018, DOI: doi.org/10.1007/978-3-030-00794-2_40.
  21. [21] Z. Hanzlíček, J. Vít, and D. Tihelka, “WaveNet-Based Speech Synthesis Applied to Czech – A Comparison with the Traditional Synthesis Methods”, P. Sojka, A.Horák, I.Kopeček, and K. Pala (eds): Text, Speech, and Dialogue (TSD 2018), LNAI 11107, pp. 445–452, 2018, DOI: 10.1007/978-3-030-00794-2_48.10.1007/978-3-030-00794-2_48
  22. [22] J. Vít, Z. Hanzlíček and J. Matoušek, “Czech Speech Synthesis with Generative Neural Vocoder”, K. Ekštein (ed.): Text, Speech, and Dialogue (TSD 2019), LNAI 11697, pp. 307–315, 2019, DOI: 10.1007/978-3-030-27947-9_26.10.1007/978-3-030-27947-9_26
  23. [23] J. Matoušek, D. Tihelka and J. Psutka, “New Slovak Unit-Selection Speech Synthesis ARTIC TTS System”, Proceedings of the International Multiconference of Engineers and Computer Scientists (IMECS), San Francisco, USA, 2011.
DOI: https://doi.org/10.2478/jee-2020-0012 | Journal eISSN: 1339-309X | Journal ISSN: 1335-3632
Language: English
Page range: 78 - 86
Submitted on: Oct 1, 2019
|
Published on: May 13, 2020
In partnership with: Paradigm Publishing Services
Publication frequency: 6 issues per year

© 2020 Jiří Přibil, Anna Přibilová, Jindřich Matoušek, published by Slovak University of Technology in Bratislava
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.