Machine Learning–based Analysis of English Lateral Allophones

Magdalena Piotrowska; Gražina Korvel; Bożena Kostek; Tomasz Ciszewski; Andrzej Cżyzewski

doi:10.2478/amcs-2019-0029

.blurhash-client-img { display: none !important; }

Machine Learning–based Analysis of English Lateral Allophones

International Journal of Applied Mathematics and Computer Science

Volume 29 (2019): Issue 2 (June 2019)

By: Magdalena Piotrowska, Gražina Korvel, Bożena Kostek, Tomasz Ciszewski and Andrzej Cżyzewski

Open Access

|Jul 2019

Ali, A.A., Van der Spiegel, J., Mueller, P., Haentjens, G. and Berman, J. (1999). An acoustic-phonetic feature-based system for automatic phoneme recognition in continuous speech, Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, ISCAS’99, Orlando, FL, USA, Vol. 3, pp. 118–121.
Search in Google Scholar Back to article
Almajai, I., Cox, S., Harvey, R. and Lan, Y. (2016). Improved speaker independent lip reading using speaker adaptive training and deep neural networks, Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, pp. 2722–2726.10.1109/ICASSP.2016.7472172
Search in Google Scholar Back to article
Aubanel, V. and Nguyen, N. (2010). Automatic recognition of regional phonological variation in conversational interaction, Speech Communication52(6): 577–586.10.1016/j.specom.2010.02.008
Search in Google Scholar Back to article
Baghdasaryan, A.G. and Beex, A. (2011). Automatic phoneme recognition with segmental hidden Markov models, 2011 Conference Record of the 45th Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, USA, pp. 569–574.10.1109/ACSSC.2011.6190066
Search in Google Scholar Back to article
Baken, R.J. and Orlikoff, R.F. (2000). Clinical Measurement of Speech and Voice, 2nd Edn., Singular Thomson Learning, San Diego, CA.
Search in Google Scholar Back to article
Benezeth, Y., Bachman, G., Le-Jan, G., Souviraà-Labastie, N. and Bimbot, F. (2011). BL-Database: A French Audiovisual Database for Speech Driven Lip Animation Systems, PhD thesis, INRIA, Rennes.
Search in Google Scholar Back to article
Biswas, A., Sahu, P.K. and Chandra, M. (2015). Multiple camera in car audio-visual speech recognition using phonetic and visemic information, Computers & Electrical Engineering47(2015): 35–50.10.1016/j.compeleceng.2015.08.009
Search in Google Scholar Back to article
Brocki, Ł. and Marasek, K. (2015). Deep belief neural networks and bidirectional long-short term memory hybrid for speech recognition, Archives of Acoustics40(2): 191–195.10.1515/aoa-2015-0021
Search in Google Scholar Back to article
Cooke, M., Barker, J., Cunningham, S. and Shao, X. (2006). An audio-visual corpus for speech perception and automatic speech recognition, The Journal of the Acoustical Society of America120(5): 2421–2424.10.1121/1.222900517139705
Search in Google Scholar Back to article
Czyzewski, A., Bratoszewski, P., Hoffmann, P., Lech, M. and Szczodrak, M. (2017a). The project IDENT: Multimodal biometric system for bank client identity verification, International Conference on Multimedia Communications, Services and Security, Poznań, Poland, pp. 16–32.10.1007/978-3-319-69911-0_2
Search in Google Scholar Back to article
Czyzewski, A., Kostek, B., Bratoszewski, P., Kotus, J. and Szykulski, M. (2017b). An audio-visual corpus for multimodal automatic speech recognition, Journal of Intelligent Information Systems49(2): 167–192.10.1007/s10844-016-0438-z
Search in Google Scholar Back to article
Czyzewski, A., Kostek, B., Ciszewski, T. and Majewicz, D. (2013). Language material for English audiovisual speech recognition system development, The Journal of the Acoustical Society of America134/5: 4069.10.1121/1.4830856
Search in Google Scholar Back to article
Dalka, P., Bratoszewski, P. and Czyzewski, A. (2014). Visual lip contour detection for the purpose of speech recognition, 2014 International Conference on Signals and Electronic Systems (ICSES), Poznań, Poland, pp. 1–4.10.1109/ICSES.2014.6948716
Search in Google Scholar Back to article
Fox, N.A., O’Mullane, B.A. and Reilly, R.B. (2005). Valid: A new practical audio-visual database, and comparative results, International Conference on Audio and Video-Based Biometric Person Authentication, Rye Brook, NY, USA, pp. 777–786.10.1007/11527923_81
Search in Google Scholar Back to article
Giegerich, H.J. (1992). English Phonology: An Introduction, Cambridge University Press, Cambridge.10.1017/CBO9781139166126
Search in Google Scholar Back to article
Giles, S.B. and Moll, K.L. (1975). Cinefluorographic study of selected allophones of English /i/, Phonetica31(3–4): 206–227.10.1159/0002596701124264
Search in Google Scholar Back to article
Gillick, L. and Cox, S.J. (1989). Some statistical issues in the comparison of speech recognition algorithms, 1989 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-89, Glasgow, UK, pp. 532–535.
Search in Google Scholar Back to article
Jadczyk, T. and Ziółko, M. (2015). Audio-visual speech processing system for polish with dynamic Bayesian network models, Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015), Barcelona, Spain, pp. 13–14.
Search in Google Scholar Back to article
Kim, H.-G., Moreau, N. and Sikora, T. (2006). MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval, John Wiley & Sons, Chichester.10.1002/0470093366
Search in Google Scholar Back to article
Kłosowski, P. (2017). Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling, EURASIP Journal on Audio, Speech, and Music Processing2017(1): 5.10.1186/s13636-017-0102-8
Search in Google Scholar Back to article
Korvel, G., Kurowski, A., Kostek, B. and Czyzewski, A. (2019). Speech analytics based on machine learning, in G. Tsihrintzis et al. (Eds.), Machine Learning Paradigms, Springer, Cham, pp. 129–157.10.1007/978-3-319-94030-4_6
Search in Google Scholar Back to article
Kostek, B., Kupryjanow, A., Zwan, P., Jiang, W., Raś, Z.W., Wojnarski, M. and Swietlicka, J. (2011). Report of the ISMIS 2011 contest: Music information retrieval, International Symposium on Methodologies for Intelligent Systems, Warsaw, Poland, pp. 715–724.10.1007/978-3-642-21916-0_75
Search in Google Scholar Back to article
Kozierski, P., Sadalla, T., Drgas, S. and Dąbrowski, A. (2016). Allophones in automatic whispery speech recognition, 21st International Conference on Methods and Models in Automation and Robotics (MMAR), Międzyzdroje, Poland, pp. 811–815.10.1109/MMAR.2016.7575241
Search in Google Scholar Back to article
Kunka, B., Kupryjanow, A., Dalka, P., Bratoszewski, P., Szczodrak, M., Spaleniak, P., Szykulski, M. and Czyzewski, A. (2013). Multimodal English corpus for automatic speech recognition, Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznań, Poland, pp. 106–111.
Search in Google Scholar Back to article
Kupryjanow, A. and Czyzewski, A. (2013). Real-time speech signal segmentation methods, Journal of the Audio Engineering Society61(7/8): 521–534.
Search in Google Scholar Back to article
Makowski, R. and Hossa, R. (2014). Automatic speech signal segmentation based on the innovation adaptive filter, International Journal of Applied Mathematics and Computer Science24(2): 259–270, DOI: 10.2478/amcs-2014-0019.10.2478/amcs-2014-0019
Open DOI Search in Google Scholar Back to article
Marasek, K. and Gubrynowicz, R. (2005). Multi-level annotation in SpeeCon Polish speech database, in L. Bolc et al. (Eds.), Intelligent Media Technology for Communicative Intelligence, Springer, Berlin/Heidelberg, pp. 58–67.10.1007/11558637_7
Search in Google Scholar Back to article
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages, Psychometrika12(2): 153–157.10.1007/BF0229599620254758
Search in Google Scholar Back to article
Mermelstein, P. (1976). Distance measures for speech recognition, psychological and instrumental, in C.H. Chen (Ed.), Pattern Recognition and Artificial Intelligence, Vol. 116, Academic Press, New York, NY, pp. 374–388.
Search in Google Scholar Back to article
Misra, H., Ikbal, S., Bourlard, H. and Hermansky, H. (2004). Spectral entropy based feature for robust ASR, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, EPFL-CONF-83132.
Search in Google Scholar Back to article
Mitterer, H., Reinisch, E. and McQueen, J.M. (2018). Allophones, not phonemes in spoken-word recognition, Journal of Memory and Language98(2018): 77–92.10.1016/j.jml.2017.09.005
Search in Google Scholar Back to article
Mroueh, Y., Marcheret, E. and Goel, V. (2015). Deep multimodal learning for audio-visual speech recognition, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, pp. 2130–2134.10.1109/ICASSP.2015.7178347
Search in Google Scholar Back to article
Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G. and Ogata, T. (2015). Audio-visual speech recognition using deep learning, Applied Intelligence42(4): 722–737.10.1007/s10489-014-0629-7
Search in Google Scholar Back to article
Pampalk, E., Rauber, A. and Merkl, D. (2002). Using smoothed data histograms for cluster visualization in self-organizing maps, International Conference on Artificial Neural Networks, Madrid, Spain, pp. 871–876.10.1007/3-540-46084-5_141
Search in Google Scholar Back to article
Panek, D., Skalski, A., Gajda, J. and Tadeusiewicz, R. (2015). Acoustic analysis assessment in speech pathology detection, International Journal of Applied Mathematics and Computer Science25(3): 631–643, DOI: 10.1515/amcs-2015-0046.10.1515/amcs-2015-0046
Open DOI Search in Google Scholar Back to article
Piotrowska, M., Korvel, G., Kostek, B., Rojczyk, A. and Czyzewski, A. (2018). Objectivization of phonological evaluation of speech elements by means of audio parametrization, 2018 11th International Conference on Human System Interaction (HSI), Gdańsk, Poland, pp. 325–331.10.1109/HSI.2018.8431352
Search in Google Scholar Back to article
Plewa, M. and Kostek, B. (2015). Music mood visualization using self-organizing maps, Archives of Acoustics40(4): 513–525.10.1515/aoa-2015-0051
Search in Google Scholar Back to article
Recasens, D. (2012). A cross-language acoustic study of initial and final allophones of /l/, Speech Communication54(3): 368–383.10.1016/j.specom.2011.10.001
Search in Google Scholar Back to article
Song, Y., Wang, W.-H. and Guo, F.-J. (2009). Feature extraction and classification for audio information in news video, International Conference on Wavelet Analysis and Pattern Recognition, ICWAPR 2009, Baoding, China, pp. 43–46.
Search in Google Scholar Back to article
Stefanowski, J., Krawiec, K. and Wrembel, R. (2017). Exploring complex and big data, International Journal of Applied Mathematics and Computer Science27(4): 669–679, DOI: 10.1515/amcs-2017-0046.10.1515/amcs-2017-0046
Open DOI Search in Google Scholar Back to article
Trojanová, J., Hrúz, M., Campr, P. and Železný, M. (2008). Design and recording of Czech audio-visual database with impaired conditions for continuous speech recognition, Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, pp. 1–5.
Search in Google Scholar Back to article
Venkateswarlu, R. and Kumari, R.V. (2011). Novel approach for speech recognition by using self-organized maps, 2011 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), Udaipur, India, pp. 215–222.10.1109/ETNCC.2011.5958519
Search in Google Scholar Back to article
Wang, Y. and Van Hamme, H. (2011). Gaussian selection using self-organizing map for automatic speech recognition, International Workshop on Self-Organizing Maps, Espoo, Finland, pp. 218–227.10.1007/978-3-642-21566-7_22
Search in Google Scholar Back to article
Żelasko, P., Ziółko, B., Jadczyk, T. and Skurzok, D. (2016). AGH corpus of Polish speech, Language Resources and Evaluation50(3): 585–601.10.1007/s10579-015-9302-y
Search in Google Scholar Back to article
Ziółko, B. and Ziółko, M. (2009). Time durations of phonemes in Polish language for speech and speaker recognition, Language and Technology Conference, Poznań, Poland, pp. 105–114.10.1007/978-3-642-20095-3_10
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/amcs-2019-0029 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X

Journal RSS Feed

Language: English

Page range: 393 - 405

Published on: Jul 4, 2019

Published by: University of Zielona Góra

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

allophones,

audio features,

artificial neural networks (ANNs),

k-nearest neighbor (kNN),

self-organizing map (SOM)

Related subjects:

Mathematics,

Applied mathematics

© 2019 Magdalena Piotrowska, Gražina Korvel, Bożena Kostek, Tomasz Ciszewski, Andrzej Cżyzewski, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 29 (2019): Issue 2 (June 2019)