Using Text and Visual Cues for Fine-Grained Classification

Zaryab Shaker; Xiao Feng; Muhammad Adeel Ahmed Tahir

doi:10.21307/ijanmc-2021-026

.blurhash-client-img { display: none !important; }

Using Text and Visual Cues for Fine-Grained Classification

International Journal of Advanced Network, Monitoring and Controls

Volume 6 (2021): Issue 3 (January 2021)

By: Zaryab Shaker, Xiao Feng and Muhammad Adeel Ahmed Tahir

Open Access

|Feb 2021

Z Akata, S Reed, D Walter, H Lee, “Evaluation of output embeddings for fine-grained image classification,” pattern recognition,” 2015.
Search in Google Scholar Back to article
X He, Y Peng, “Fine-grained image classification via combining vision and language,” Computer Vision and Pattern Recognition, 2017.
Search in Google Scholar Back to article
Maron, AL Ratan, “Multiple-instance learning for natural scene classification,” ICML, 1998.
Search in Google Scholar Back to article
W Geng, F Han, J Lin, L Zhu, J Bai, S Wang, “Fine-grained grocery product recognition by one-shot learning,” Proceedings of the 26th ACM international conference on Multimedia,” 2018.
Search in Google Scholar Back to article
S Albawi, TA Mohammed, “Understanding of a convolutional neural network,” IEEE, 2017.
Search in Google Scholar Back to article
SE Umbaugh, “Digital image processing and analysis: human and compute vision applications with CVIPtools,” Amazon book, 2010.
Search in Google Scholar Back to article
Q Ye, D Doermann, “Text detection and recognition in imagery: A survey,” IEEE transactions on pattern analysis, 2014.
Search in Google Scholar Back to article
L Neumann, J Matas, “A method for text localization and recognition in real-world images,” Asian conference on computer vision, 2010.
Search in Google Scholar Back to article
A Coates, B Carpenter, C Case, “Text detection and character recognition in scene images with unsupervised feature learning,” IEEE, 2011.
Search in Google Scholar Back to article
M Jaderberg, A Vedaldi, A Zisserman, “Deep features for text spotting,” European conference on computer, 2014.
Search in Google Scholar Back to article
C Yao, X Bai, W Liu, “A unified framework for multioriented text detection and recognition,” IEEE Transactions on Image Processing, 2014
Search in Google Scholar Back to article
P Shivakumara, A Dutta, CL Tan, U Pal, “Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing,” Multimedia tools and applications, 2014.
Search in Google Scholar Back to article
Z Zhang, C Zhang, W Shen, C Yao, “Multi-oriented text detection with fully convolutional networks,” pattern recognition, 2016.
Search in Google Scholar Back to article
Y Zhu, C Yao, X Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science, 2016.
Search in Google Scholar Back to article
B Zhao, J Feng, X Wu, S Yan, “segmentation,” International Journal of Automation, 2017.
Search in Google Scholar Back to article
N Zhang, J Donahue, R Girshick, T Darrell, “Part-based R-CNNs for fine-grained category detection,” European conference, 2014.
Search in Google Scholar Back to article
E Gavves, B Fernando, CGM Snoek, “Fine-grained categorization by alignments,” IEEE 2013.
Search in Google Scholar Back to article
P Baraldi, M Compare, S Sauco, E Zio, “Ensemble neural network-based particle filtering for prognostics,” Mechanical Systems and Signal, 2013.
Search in Google Scholar Back to article
F Fan, Y Feng, “D Zhao Multi-grained attention network for aspect-level sentiment classification,” conference on empirical methods, 2018.
Search in Google Scholar Back to article
OM Parkhi, A Vedaldi, A Zisserman, “Cats and dogs,” IEEE conference, 2012.
Search in Google Scholar Back to article
G Lowe, “Sift-the scale invariant feature transform,” Int. J 2004.
Search in Google Scholar Back to article
N Dalal, B Triggs, “Histograms of oriented gradients for human detection,” IEEE computer society conference, 2005.
Search in Google Scholar Back to article
J Van De Weijer, C Schmid, J Verbeek, “Learning color names for real-world applications,” IEEE Transactions, 2009.
Search in Google Scholar Back to article
T Berg, PN Belhumeur, “Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation,” Proceedings of the IEEE, 2013.
Search in Google Scholar Back to article
KC Kamal, Z Yin, B Li, B Ma, “Transfer learning for fine-grained crop disease classification based on leaf images,” IEEE, 2019.
Search in Google Scholar Back to article
V Badrinarayanan, A Kendall, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on, 2017.
Search in Google Scholar Back to article
P Rodríguez, D Velazquez, G Cucurull, “Pay attention to the activations: a modular attention mechanism for fine-grained image recognition,” IEEE Transactions, 2019.
Search in Google Scholar Back to article
A Mafla, S Dey, AF Biten, L Gomez, “Fine-grained image classification and retrieval by combining visual and locally pooled textual features,” WACV, 2020.
Search in Google Scholar Back to article
X Bai, M Yang, P Lyu, Y Xu, J Luo, “Integrating scene text and visual appearance for fine-grained image classification,” IEEE Access, 2018.
Search in Google Scholar Back to article
K Cho, A Courville, Y Bengio, “Describing multimedia content using attention-based encoder-decoder networks,” IEEE Transactions on Multimedia, 2015.
Search in Google Scholar Back to article
PK Atrey, MA Hossain, A El Saddik, MS Kankanhalli, “Multimodal fusion for multimedia analysis: a survey,” Multimedia systems, 2010.
Search in Google Scholar Back to article
X Yang, P Molchanov, J Kautz, “Multilayer and multimodal fusion of deep neural networks for video classification,” Proceedings of the 24th ACM, 2016.
Search in Google Scholar Back to article
H Liu, Y Wu, F Sun, B Fang, “Weakly paired multimodal fusion for object recognition,” IEEE, 2017.
Search in Google Scholar Back to article
N Audebert, C Herold, K Slimani, C Vidal, “Multimodal deep networks for text and image-based document classification,” Joint European Conference, 2019.
Search in Google Scholar Back to article
P Maragos, A Potamianos, P Gros, “Multimodal processing and interaction: audio, video, text,” IEEE 2008.
Search in Google Scholar Back to article
J Deng, W Dong, R Socher, LJ Li, K Li, “ImageNet,” IEEE, 2009.
Search in Google Scholar Back to article
Karen Simonyan, Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” Department of Engineering Science, University of Oxford, 2015.
Search in Google Scholar Back to article
A Karnawat, K More, T Rade, B Rane, M Mulik, “A Survey on Easy OCR Techniques used to build Systems for Visually Impaired People,” ITB, 2016.
Search in Google Scholar Back to article
R Smith, “An overview of the Tesseract OCR engine,” Ninth international conference on document analysis, 2007.
Search in Google Scholar Back to article
KW Church, “Word2Vec,” Natural Language Engineering, 2017.
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.21307/ijanmc-2021-026 | Journal eISSN: 2470-8038

Journal RSS Feed

Language: English

Page range: 42 - 49

Published on: Feb 22, 2021

Published by: Xi’an Technological University

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

Scene Text,

Product Text,

Fine-Grained Classification,

Convolution Neural Network,

Attention,

Product Search

Related subjects:

Computer sciences,

Computer sciences, other

© 2021 Zaryab Shaker, Xiao Feng, Muhammad Adeel Ahmed Tahir, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 6 (2021): Issue 3 (January 2021)