Skip to main content
Have a personal or library account? Click to login
Protein Function Prediction with Pretrained Transformers: Performance, Pitfalls, and Practical Guidance Cover

Protein Function Prediction with Pretrained Transformers: Performance, Pitfalls, and Practical Guidance

By:   
Open Access
|Apr 2026

Abstract

Transformer-based protein language models (PLMs) learn meaningful representations from millions of unlabeled sequences, capturing evolutionary patterns and functional relationships. Recent advances include ESM-2’s systematic scaling to 15 billion parameters, structure-aware vocabularies (SaProt), and multimodal foundation models (ESM-3, 98B parameters). PLMs achieve state-of-the-art performance: Gene Ontology prediction (F-max 0.64–0.68), enzyme classification (81% accuracy), and variant effect prediction (Spearman ρ 0.52–0.55). Deep-layer attention correlates 44–63% with 3D contacts despite no structural training. This review synthesizes recent PLM developments, benchmarks, and practical applications, providing guidance for experimental biologists on model selection and validation strategies.

Language: English
Page range: 35 - 45
Published on: Apr 30, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2026 Kushal Raj Roy, published by European Biotechnology Thematic Network Association
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.