References
- Yehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion and speech acoustics. Journal of phonetics, 30(3), 555–568.
- Greenwood, D., Laycock, & S., Matthews, I. (2017). Predicting head pose from speech with a conditional variational autoencoder. Interspeech 2017, 3991-3995.
- Czap, L., & Kilik, R. (2015). Automatic gesture generation. Production Systems and Information Engineering, 7, 5–14.
- Zhou Y., Han X., Shechtman E., Echevarria j., Kalogerakis E., & Li D. (2020). MakeltTalk: speaker-aware talking-head animation. ACM Transactions on Graphics (TOG) 39, 6, 1–15
- Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Niessner, M., ... & Theobalt, C. (2018). Deep video portraits. ACM Transactions on Graphics (TOG), 37(4), 1-14.
- Cheng, Y., & Church, G. M. (2000). Biclustering of expression data. Ismb, 8, 93–103.
- Getz, G., Levine E., & Domany, E. (2000). Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences, 97(22), 12079–12084.
- Deng, Z., Narayanan, S., Busso, C., & Neumann U. (2004). Audio-based head motion synthesis for avatar-based telepresence systems. Proceedings of the 2004 ACM SIGMM workshop on Effective telepresence, 24–30.
- Grimm, M., & Neumann, U., & Busso, C., Deng Z., & Narayanan S. (2005). Natural head motion synthesis driven by acoustic prosodic features. Journal of Visualization and Computer Animation, (3-5), 283–290.
- Grimm, M., Neumann, U., Busso, C., Deng, Z., & Narayanan S. (2007). Rigid head motion in expressive speech animation: Analysis and synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 3, 1075–1086.
- Matthews, I., Laycock S., & Greenwood, D. (2018). Joint learning of facial expression and head pose from speech., 15, 2484–2488.
- Hofer, G., & Shimodaira, H. (2007). Automatic head motion prediction from speech data. In Interspeech 2007, 722-725.
- Ji, Xinya, et al. (2022). Eamm: One-shot emotional talking face via audio-based emotion-aware motion model. ACM SIGGRAPH 2022 Conference Proceedings. 2022.
- Lu, Y., Chai, J., & Cao, X. (2021). Live speech portraits: real-time photorealistic talking-head animation. ACM Transactions on Graphics (TOG), 40(6), 1-17.
- Ben Youssef, A., Shimodaira, H., & Braude, D. A. (2013). Articulatory features for speech-driven head motion synthesis. Proceedings of Interspeech, Lyon, France.
- Baudat, G., & Anouar, F. (2000). Generalized discriminant analysis using a kernel approach. Neural computation, 12(10), 2385–2404.
- Liu, X., Yin, J., Feng, Z., Dong, J., & Wang Lu. (2007). Orthogonal neighborhood preserving embedding for face recognition. In Image Processing, 2007. ICIP 2007. IEEE International Conference, 1, 133-136.
- Roweis, S. T. et al. (2002). Automatic alignment of hidden representations. Sixteenth Annual Conference on Neural Information Processing Systems, Vancouver, Canada, 15, 841–848.
- Tibshirani, R., Walther, G., & Hastie T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423.
- Davies, L. D., & Bouldin, W. D. (1979). A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, 2, 224–227.
- Calinszki, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1), 1–27.