References
- Z Akata, S Reed, D Walter, H Lee, “Evaluation of output embeddings for fine-grained image classification,” pattern recognition,” 2015.
- X He, Y Peng, “Fine-grained image classification via combining vision and language,” Computer Vision and Pattern Recognition, 2017.
- Maron, AL Ratan, “Multiple-instance learning for natural scene classification,” ICML, 1998.
- W Geng, F Han, J Lin, L Zhu, J Bai, S Wang, “Fine-grained grocery product recognition by one-shot learning,” Proceedings of the 26th ACM international conference on Multimedia,” 2018.
- S Albawi, TA Mohammed, “Understanding of a convolutional neural network,” IEEE, 2017.
- SE Umbaugh, “Digital image processing and analysis: human and compute vision applications with CVIPtools,” Amazon book, 2010.
- Q Ye, D Doermann, “Text detection and recognition in imagery: A survey,” IEEE transactions on pattern analysis, 2014.
- L Neumann, J Matas, “A method for text localization and recognition in real-world images,” Asian conference on computer vision, 2010.
- A Coates, B Carpenter, C Case, “Text detection and character recognition in scene images with unsupervised feature learning,” IEEE, 2011.
- M Jaderberg, A Vedaldi, A Zisserman, “Deep features for text spotting,” European conference on computer, 2014.
- C Yao, X Bai, W Liu, “A unified framework for multioriented text detection and recognition,” IEEE Transactions on Image Processing, 2014
- P Shivakumara, A Dutta, CL Tan, U Pal, “Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing,” Multimedia tools and applications, 2014.
- Z Zhang, C Zhang, W Shen, C Yao, “Multi-oriented text detection with fully convolutional networks,” pattern recognition, 2016.
- Y Zhu, C Yao, X Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science, 2016.
- B Zhao, J Feng, X Wu, S Yan, “segmentation,” International Journal of Automation, 2017.
- N Zhang, J Donahue, R Girshick, T Darrell, “Part-based R-CNNs for fine-grained category detection,” European conference, 2014.
- E Gavves, B Fernando, CGM Snoek, “Fine-grained categorization by alignments,” IEEE 2013.
- P Baraldi, M Compare, S Sauco, E Zio, “Ensemble neural network-based particle filtering for prognostics,” Mechanical Systems and Signal, 2013.
- F Fan, Y Feng, “D Zhao Multi-grained attention network for aspect-level sentiment classification,” conference on empirical methods, 2018.
- OM Parkhi, A Vedaldi, A Zisserman, “Cats and dogs,” IEEE conference, 2012.
- G Lowe, “Sift-the scale invariant feature transform,” Int. J 2004.
- N Dalal, B Triggs, “Histograms of oriented gradients for human detection,” IEEE computer society conference, 2005.
- J Van De Weijer, C Schmid, J Verbeek, “Learning color names for real-world applications,” IEEE Transactions, 2009.
- T Berg, PN Belhumeur, “Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation,” Proceedings of the IEEE, 2013.
- KC Kamal, Z Yin, B Li, B Ma, “Transfer learning for fine-grained crop disease classification based on leaf images,” IEEE, 2019.
- V Badrinarayanan, A Kendall, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on, 2017.
- P Rodríguez, D Velazquez, G Cucurull, “Pay attention to the activations: a modular attention mechanism for fine-grained image recognition,” IEEE Transactions, 2019.
- A Mafla, S Dey, AF Biten, L Gomez, “Fine-grained image classification and retrieval by combining visual and locally pooled textual features,” WACV, 2020.
- X Bai, M Yang, P Lyu, Y Xu, J Luo, “Integrating scene text and visual appearance for fine-grained image classification,” IEEE Access, 2018.
- K Cho, A Courville, Y Bengio, “Describing multimedia content using attention-based encoder-decoder networks,” IEEE Transactions on Multimedia, 2015.
- PK Atrey, MA Hossain, A El Saddik, MS Kankanhalli, “Multimodal fusion for multimedia analysis: a survey,” Multimedia systems, 2010.
- X Yang, P Molchanov, J Kautz, “Multilayer and multimodal fusion of deep neural networks for video classification,” Proceedings of the 24th ACM, 2016.
- H Liu, Y Wu, F Sun, B Fang, “Weakly paired multimodal fusion for object recognition,” IEEE, 2017.
- N Audebert, C Herold, K Slimani, C Vidal, “Multimodal deep networks for text and image-based document classification,” Joint European Conference, 2019.
- P Maragos, A Potamianos, P Gros, “Multimodal processing and interaction: audio, video, text,” IEEE 2008.
- J Deng, W Dong, R Socher, LJ Li, K Li, “ImageNet,” IEEE, 2009.
- Karen Simonyan, Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” Department of Engineering Science, University of Oxford, 2015.
- A Karnawat, K More, T Rade, B Rane, M Mulik, “A Survey on Easy OCR Techniques used to build Systems for Visually Impaired People,” ITB, 2016.
- R Smith, “An overview of the Tesseract OCR engine,” Ninth international conference on document analysis, 2007.
- KW Church, “Word2Vec,” Natural Language Engineering, 2017.