Skip to main content
Have a personal or library account? Click to login
Protein Function Prediction with Pretrained Transformers: Performance, Pitfalls, and Practical Guidance Cover

Protein Function Prediction with Pretrained Transformers: Performance, Pitfalls, and Practical Guidance

By:   
Open Access
|Apr 2026

References

  1. Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido and A. Rives, Evolutionary-scale prediction of atomic-level protein structure with a language model, Science, 2023, 379, 1123–1130.
  2. J. Su, C. Han, Y. Zhou, J. Shan, X. Zhou and F. Yuan, SaProt: Protein language modeling with structure-aware vocabulary, Proc. Int. Conf. Learn. Represent., 2024.
  3. T. Hayes, R. Rao, H. Akin, N. J. Sofroniew, D. Oktay, Z. Lin, R. Verkuil, V. Q. Tran, J. Deaton, M. Wiggert, R. Badkundri, I. Shafkat, J. Gong, A. Derry, R. S. Molina, N. Thomas, A. Khan, C. Mishra, C. Kim, L. J. Bartie, M. Nemeth, P. D. Hsu, T. Sercu, S. Candido and A. Rives, Simulating 500 million years of evolution with a language model, Science, 2025, 383, eadl5946.
  4. Q. Yu, T. Cui, H. Li, J. C. Li, Y. Luo, L. Xie and L. Ma, Enzyme function prediction using contrastive learning, Science, 2023, 379, 1358–1363.
  5. P. Notin, N. Rollins, Y. Gal, C. Sander and D. Marks, Machine learning for functional protein design, Nat. Biotechnol., 2024, 42, 216–228.
  6. UniProt Consortium, UniProt: the universal protein knowledgebase in 2023, Nucleic Acids Res., 2023, 51, D523–D531.
  7. R. Schmirler, M. Heinzinger and B. Rost, Fine-tuning protein language models boosts predictions across diverse tasks, Nat. Commun., 2024, 15, 7407.
  8. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, 30, 5998–6008.
  9. M. Heinzinger, K. Weissenow, J. G. Sanchez, A. Henkel, M. Mirdita, M. Steinegger and B. Rost, Bilingual language model for protein sequence and structure, NAR Genomics Bioinformatics, 2024, 6, lqae021.
  10. J. Vig, A. Madani, L. R. Varshney, C. Xiong, R. Socher and N. Rajani., BERTology meets biology: interpreting attention in protein language models, Proc. Int. Conf. Learn. Represent., 2021.
  11. J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakiol, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S. A. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P. Kohli and D. Hassabis, Highly accurate protein structure prediction with AlphaFold, Nature, 2021, 596, 583–589.
  12. J. Abramson, J. Adler, J. Dunger, R. Evans, T. Green, A. Pritzel, O. Ronneberger, L. Willmore, A. J. Ballard, J. Bambrick, S. W. Bodenstein, D. A. Evans, C. C. Hung, M. O’Neill, D. Reiman, K. Tunyasuvunakiol, Z. Wu, A. Žemgulytė, E. Arvaniti, C. Beattie, O. Bertolli, A. Bridgland, A. Cherepanov, M. Congreve, A. I. Cowen-Rivers, A. Cowie, M. Figurnov, F. B. Fuchs, H. Gladman, R. Jain, Y. A. Khan, C. M. R. Low, K. Perlin, A. Potapenko, P. Savy, S. Singh, A. Stecula, A. Thillaisundaram, C. Tong, S. Yakneen, E. D. Zhong, M. Zielinski, A. Žídek, V. Bapst, P. Kohli, M. Jaderberg, D. Hassabis and J. M. Jumper, Accurate structure prediction of biomolecular interactions with AlphaFold 3, Nature, 2024, 630, 493–500.
  13. A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Ma and R. Fergus, Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences, Proc. Natl. Acad. Sci. U. S. A., 2021, 118, e2016239118.
  14. A. Elnaggar, M. Heinzinger, C. Dallago, G. Rehawi, Y. Wang, L. Jones, T. Gibbs, T. Feher, C. Angerer, M. Steinegger, D. Bhowmik and B. Rost, ProtTrans: Toward understanding the language of life through self-supervised learning, IEEE Trans. Pattern Anal. Mach. Intell., 2022, 44, 7112–7127.
  15. J. Meier, R. Rao, R. Verkuil, J. Liu, T. Sercu and A. Rives, Language models enable zero-shot prediction of the effects of mutations on protein function, Adv. Neural Inf. Process. Syst., 2021, 34, 29287–29303.
  16. J. L. Watson, D. Juergens, N. R. Bennett, B. L. Trippe, J. Yim, H. E. Eisenach, W. Ahern, A. J. Borst, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, N. Hanikel, S. J. Pellock, A. Courbet, W. Sheffler, J. Wang, P. Venkatesh, I. Sappington, S. V. Torres, A. Lauko, V. De Bortoli, E. Mathieu, R. Barzilay, T. S. Jaakkola, F. DiMaio, M. Baek and D. Baker, De novo design of protein structure and function with RFdiffusion, Nature, 2023, 620, 1089–1100.
  17. A. Madani, B. Krause, E. R. Greene, S. Subramanian, B. P. Mohr, J. M. Holton, J. L. Olmos Jr, C. Xiong, Z. Z. Sun, R. Socher, J. S. Fraser and N. Naik, Large language models generate functional protein sequences across diverse families, Nat. Biotechnol., 2023, 41, 1099–1106.
  18. Gene Ontology Consortium, The Gene Ontology knowledgebase in 2023, Genetics, 2023, 224, iyad031.
  19. K. Guo, Y. Zhou, X. Guo, A. Gao, J. Li, D. Chen, H. Guo, Z. Ma, Q. Liang and M. Jiang, Integrating protein structure and deep learning for protein function prediction in the CAFA5 challenge, bioRxiv, 2024, DOI: 10.1101/2024.02.05.578892.
Language: English
Page range: 35 - 45
Published on: Apr 30, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2026 Kushal Raj Roy, published by European Biotechnology Thematic Network Association
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.