References
- Prometheus Authors. Prometheus: From metrics to insight; 2012.
https://prometheus.io/ . - Grafana Labs. Grafana: The open observability platform; 2014.
https://grafana.com/ . - Galstad E. Nagios: The industry standard in IT infrastructure monitoring.
https://www.nagios.org/ . - Evans RT, Browne JC, Barth WL. Comprehensive resource use monitoring for HPC systems with TACC Stats. In: Proceedings of the First International Workshop on HPC User Support Tools.
IEEE ; 2014. pp. 13–21. DOI: 10.1109/HUST.2014.7 - Palmer JT, Gallo SM, Furlani TR, Jones MD, DeLeon RL, White JP, Simakov N, Patra AK, Sperhac J, Yearke T, Rathsam R, Inber M, Guillen O, Cornelius CD. Open XDMoD: A tool for the comprehensive management of high-performance computing resources. Computing in Science & Engineering. 2015;17(4):52–62. DOI: 10.1109/MCSE.2015.68
- Agelastos A, Allan B, Brandt J, Cassella P, Enos J, Fullop J, Gentile A, Monk S, Naksinehaboon N, Ogden J, Rajan M, Showerman M, Stevenson J, Taerat N, Tucker T. The Lightweight Distributed Metric Service: A scalable infrastructure for continuous monitoring of large scale computing systems and applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.
IEEE ; 2014. pp. 154–165. DOI: 10.1109/SC.2014.18 - Tuncer O, Ates E, Zhang Y, Turber A, Brandt J, Leung VJ, Egele M, Coskun AK.
Diagnosing performance variations in HPC applications using machine learning . In: High Performance Computing, Lecture Notes in Computer Science, vol. 10266. Springer; 2017. pp. 355–373. DOI: 10.1007/978-3-319-58667-0_19 - Vilhena DA, Antonelli A. A network approach for identifying and delimiting biogeographical regions. Nature Communications. 2015;6:6848. DOI: 10.1038/ncomms7848
- Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.
http://jmlr.org/papers/v12/pedregosa11a.html . - Yoo AB, Jette MA, Grondona M.
SLURM: Simple Linux Utility for Resource Management . In: Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, vol. 2862. Springer; 2003. pp. 44–60. DOI: 10.1007/10968987_3
