Skip to main content
Have a personal or library account? Click to login
Contextualized Vision Transformers (CVT): Adaptive Spectral Embedding and Feature Gating for Precise Text-Graphics Classification Cover

Contextualized Vision Transformers (CVT): Adaptive Spectral Embedding and Feature Gating for Precise Text-Graphics Classification

Open Access
|Mar 2026

Abstract

The accurate classification of images into graphics-only, text-only, and mixed-content categories is a critical prerequisite for building efficient, content-aware processing pipelines. This initial triage prevents the unnecessary application of computationally expensive operations, such as Optical Character Recognition (OCR), to irrelevant graphical data. To address this challenge, we introduce the Contextualized Vision Transformer (CVT), a novel architecture designed specifically for this nuanced classification task. The CVT addresses the limitations of standard Vision Transformers (ViTs) through three synergistic components. It employs a Learnable Patch Decomposition (LPD) strategy that efficiently extracts patch embeddings. To model complex spatial arrangements, it introduces an Adaptive Spectral Embedding (ASE) module, which replaces static positional encodings with a dynamic, learnable representation. Crucially, a Contextual Feature Gating (CFG) module enhances feature discriminability by adaptively recalibrating patch-level features, selectively amplifying the salient text or graphic regions. Comprehensive K-fold cross-validation demonstrates the robustness and generalizability of the proposed model. The CVT achieves statistically significant improvements in accuracy, precision, and recall compared to state-of-the-art Vision Transformer baselines. Experimental results highlight the effectiveness of this architecture in providing a fast and accurate solution for the vital task of content triage in large-scale visual processing systems.

DOI: https://doi.org/10.2478/tmmp-2026-0003 | Journal eISSN: 1338-9750 | Journal ISSN: 1210-3195
Language: English
Submitted on: Oct 8, 2025
Accepted on: Nov 27, 2026
Published on: Mar 17, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Mridul Ghosh, Konrad Dürrbeck, Roland Fischer, Mária Ždímalová, Tonmoy Mete, published by Slovak Academy of Sciences, Mathematical Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

AHEAD OF PRINT