Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations

Jiří Přibil; Anna Přibilová; Jindřich Matoušek

doi:10.2478/jee-2020-0012

.blurhash-client-img { display: none !important; }

Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations

Journal of Electrical Engineering

Volume 71 (2020): Issue 2 (April 2020)

By: Jiří Přibil, Anna Přibilová and Jindřich Matoušek

Open Access

|May 2020

[1] A. Zelenik and Z. Kacic, “Multi-Resolution Feature Extraction Algorithm in Emotional Speech Recognition”, Elektronika ir Elektrotechnika, vol. 21, no. 5, pp. 54–58, 2015, DOI: 10.5755/j01.eee.21.5.13328.10.5755/j01.eee.21.5.13328
Search in Google Scholar Back to article
[2] M. Grůber and J. Matoušek, “Listening-Test-Based Annotation of Communicative Functions for Expressive Speech Synthesis”, P. Sojka, A. Horak, I. Kopecek, K. Pala (eds.): Text, Speech, and Dialogue (TSD) 2010, LNCS, vol. 6231, pp. 283–290, Springer 2010.
Search in Google Scholar Back to article
[3] P. C. Loizou, “Speech Quality Assessment”, W. Tao, et al.(eds): Multimedia Analysis, Processing and Communications. Studies Computational Intelligence, vol. 346, pp. 623–654, Springer, Berlin, Heidelberg, 2011, DOI:10.1007/978-3-642-19551-8_23.10.1007/978-3-642-19551-8_23
Search in Google Scholar Back to article
[4] H. Ye and S. Young, “High Quality Voice Morphing”, ICASSP 2004 Proceedings. IEEE International Conference on Acoustics, Speech, and Signal Processing, 17-21 May 2004, Montreal, Canada, DOI:10.1109/ICASSP.2004.1325909.10.1109/ICASSP.2004.1325909
Search in Google Scholar Back to article
[5] M. Adiban, B. BabaAli and S. Shehnepoor, “Statistical Feature Embedding for Heart Sound Classification”, Journal of Electrical Engineering, vol. 70, no. 4, pp. 259–272, 2019, DOI: 10.2478/jee-2019-0056.10.2478/jee-2019-0056
Search in Google Scholar Back to article
[6] B. Boilović, B. M. Todorović and M. Obradović, “Text-Independent Speaker Recognition using Two-Dimensional Information Entropy”, Journal of Electrical Engineering, vol. 66, no. 3, pp. 169–173, 2015, DOI: 10.1515/jee-2015-0027.
Search in Google Scholar Back to article
[7] C. Y. Lee and Z. J. Lee, “A Novel Algorithm Applied to Classify Unbalanced Data”, Applied Soft Computing, vol. 12, pp. 2481–2485, 2012, DOI: 10.1016/j.asoc.2012.03.051.10.1016/j.asoc.2012.03.051
Search in Google Scholar Back to article
[8] R. Vích, J. Nouza and M. Vondra, “Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems”, A. Esposito et al. (eds.): Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction, LNCS, vol. 5042, pp. 136–148, Springer 2008.
Search in Google Scholar Back to article
[9] M. Cerňak, M. Rusko and M. Trnka, “Diagnostic Evaluation of Synthetic Speech using Speech Recognition”, Procs. of the 16th International Congress on Sound and Vibration (ICSV16), Kraków, Poland, 5-9 July, p. 6, 2009, https://pdfs.semanticscholar.org/502b/f1d8bfb0cc90cd3defcc9d479d9a97b23b66.pdf.
Search in Google Scholar Back to article
[10] S. Möller, and J. Heimansberg, “Estimation of TTS Quality Telephone Environments Using a Reference-free Quality Prediction Model”, Second ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, Germany, September 2006, pp. 56–60, ISCA Archive, http://www.isca-speech.org/archive_open/pqs2006.
Search in Google Scholar Back to article
[11] D.-Y. Huang, “Prediction of Perceived Sound Quality of Synthetic Speech”, Procs. of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2011 Xi’an, China, October 18-21, 2011, p. 6, http://www.apsipa.org/proceedings2011/pdf/APSIPA100.pdf.
Search in Google Scholar Back to article
[12] S. Möller et al, “Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems”, 2010, INTERSPEECH- 2010, pp. 1325–1328, https://www.isca-speech.org/archive/archive_papers/interspeech_2010/i10_1325.pdf.10.21437/Interspeech.2010-413
Search in Google Scholar Back to article
[13] F. Hinterleitner et al, “Predicting the Quality of Synthesized Speech using Reference-Based Prediction Measures”, Studientexte zur Sprachkommunikation: Elektronische Sprachsignalver-arbeitung, Session: Sprachsynthese-Evaluation und Prosodie, 2011, pp. 99–106, TUDpress, Dresden, http://www.essv.de/paper.php?id=14.
Search in Google Scholar Back to article
[14] J. P. H. van Santen, “Segmental Duration and Speech Timing”, Y. Sagisaka, N.Campbell, N.Higuchi (eds.): Computing Prosody, Springer, New York, NY, pp. 225–248, 1997.10.1007/978-1-4612-2258-3_15
Search in Google Scholar Back to article
[15] C. M. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006.
Search in Google Scholar Back to article
[16] V. Rodellar-Biarge, D. Palacios-Alonso, V. Nieto-Lluis, and P. Gomez-Vilda, “Towards the search of detection speech-relevant features for stress”, Expert Systems, vol. 32, no.6, pp. 710-718, 2015.DOI: 10.1111/exsy.12109.10.1111/exsy.12109
Search in Google Scholar Back to article
[17] A. J. Hunt and A. W. Black, “Unit Selection a Concatenative Speech Synthesis System using a Large Speech Database”, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Atlanta (Georgia, USA), pp. 373–376, 1996, DOI: 10.1109/ICASSP.1996.541110.10.1109/ICASSP.1996.541110
Search in Google Scholar Back to article
[18] J. Kala and J. Matoušek, “Very Fast Unit Selection using Viterbi Search with Zero-Concatenation-Cost Chains”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), Florence, Italy, pp. 2569–2573, 2014.
Search in Google Scholar Back to article
[19] M. Jůzová, D. Tihelka and R. Skarnitzl, “Last Syllable Unit Penalization Unit Selection TTS”, K. Ekstein and V. Matousek (eds.): Text, Speech, and Dialogue (TSD 2017), LNAI vol. 10415, pp. 317–325, 2017, DOI: 10.1007/978-3-319-64206-2 36.10.1007/978-3-319-64206-2
Search in Google Scholar Back to article
[20] D. Tihelka, Z. Hanzlíček, M. Jůzová, J. Vít, J. Matoušek and M. Grůber, “Current State of Text-to-Speech System ARTIC: A Decade of Research on the Field of Speech Technologies”, P. Sojka, A.Horák, I.Kopeček, and K. Pala (eds): Text, Speech, and Dialogue (TSD 2018), LNAI 11107, pp. 369–378, 2018, DOI: doi.org/10.1007/978-3-030-00794-2_40.
Search in Google Scholar Back to article
[21] Z. Hanzlíček, J. Vít, and D. Tihelka, “WaveNet-Based Speech Synthesis Applied to Czech – A Comparison with the Traditional Synthesis Methods”, P. Sojka, A.Horák, I.Kopeček, and K. Pala (eds): Text, Speech, and Dialogue (TSD 2018), LNAI 11107, pp. 445–452, 2018, DOI: 10.1007/978-3-030-00794-2_48.10.1007/978-3-030-00794-2_48
Search in Google Scholar Back to article
[22] J. Vít, Z. Hanzlíček and J. Matoušek, “Czech Speech Synthesis with Generative Neural Vocoder”, K. Ekštein (ed.): Text, Speech, and Dialogue (TSD 2019), LNAI 11697, pp. 307–315, 2019, DOI: 10.1007/978-3-030-27947-9_26.10.1007/978-3-030-27947-9_26
Search in Google Scholar Back to article
[23] J. Matoušek, D. Tihelka and J. Psutka, “New Slovak Unit-Selection Speech Synthesis ARTIC TTS System”, Proceedings of the International Multiconference of Engineers and Computer Scientists (IMECS), San Francisco, USA, 2011.
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/jee-2020-0012 | Journal eISSN: 1339-309X | Journal ISSN: 1335-3632

Journal RSS Feed

Language: English

Page range: 78 - 86

Submitted on: Oct 1, 2019

Published on: May 13, 2020

Published by: Slovak University of Technology in Bratislava

In partnership with: Paradigm Publishing Services

Keywords:

listening test,

objective and subjective evaluation,

quality of synthetic speech,

statistical analysis

Related subjects:

Engineering,

Introductions and overviews,

Engineering, other

© 2020 Jiří Přibil, Anna Přibilová, Jindřich Matoušek, published by Slovak University of Technology in Bratislava
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Volume 71 (2020): Issue 2 (April 2020)