Skip to main content

Efficient Hierarchical Temporal Audio-Video Cross-Attention Fusion Network for Audio-Enhanced Text-To-Video Retrieval Cover

.blurhash-client-img { display: none !important; }

Efficient Hierarchical Temporal Audio-Video Cross-Attention Fusion Network for Audio-Enhanced Text-To-Video Retrieval

International Journal on Smart Sensing and Intelligent Systems

Volume 19 (2026): Issue 1 (January 2026)

By: R. Rashmi and H. K. Chethan

Open Access

|Apr 2026

Figures & tables

Authors

R. Rashmi

rrashmiphd213@gmail.com

Maharaja Institute of Technology Mysore, Srirangapatna, India

H. K. Chethan

Maharaja Institute of Technology Mysore, Srirangapatna, India

Articles in this issue

DOI: https://doi.org/10.2478/ijssis-2026-0009 | Journal eISSN: 1178-5608

Journal RSS Feed

Language: English

Submitted on: Jul 25, 2025

|

Published on: Apr 7, 2026

Published by: Macquarie University, Australia

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

feature pyramid transformer,

audio spectrogram short-term memory transformer,

temporal RoBERTa graph network,

multi-head scaled dot random boosting forest,

multimedia retrieval optimization

Related subjects:

Introductions and overviews,

Engineering, other

© 2026 R. Rashmi, H. K. Chethan, published by Macquarie University, Australia
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 19 (2026): Issue 1 (January 2026)

Previous article