Inferring Biomedical Networks Using Multivariate Information Theory: Open-Source Code and Tutorial

Das, Madhumita; Das, Bishwajit; Majumder, Ishaan; Majumder, Durjoy

doi:10.2478/cait-2025-0040

Abstract

In Systems Biology, gene expression data are crucial for designing biological system circuitry. While clustering and soft computing techniques are commonly used for classification, Information Theory-based entropy functions – particularly multivariate entropy – remain underutilized for deriving biological inferences. With the advent of high-throughput data acquisition systems, more quantitative data are now available, increasing the relevance of Information Theory-based applications. Simultaneously, this creates a demand for a user-friendly, automated analytical framework. This work presents an automated computational framework for the systematic exploration of molecular data, designed to facilitate the construction of biological process-based networks. Algorithms based on multivariate Information Theory have been implemented on different platforms: one in a proprietary environment (MATLAB) and two in open-source environments (GNU Octave and Python). All implementations are ready to use, allowing researchers to analyze their data using the platform of their choice. The algorithms have been successfully tested on published gene expression datasets.

References

Schwab, J. D., S. D. Kühlwein, N. Ikonomi, M. Kühl, H. A. Kestler. Concepts in Boolean Network Modeling: What Do They All Mean? – Computational and Structural Biotechnology Journal, Vol. 18, 2020, pp. 571-582.
Search in Google Scholar Back to article
Delgado, F. M., F. Gómez-Vela. Computational Methods for Gene Regulatory Networks Reconstruction and Analysis: A Review. – Artificial Intelligence in Medicine, Vol. 95, 2019, pp. 133-145.
Search in Google Scholar Back to article
Milano, M., G. Agapito, M. Cannataro. Challenges and Limitations of Biological Network Analysis. – BioTech. (Basel), Vol. 11, 2022, No 3, 24.
Search in Google Scholar Back to article
Golub, T. R., D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C D. Bloomfield, E. S. Lander. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. – Science, Vol. 286, 1999, pp. 531-537.
Search in Google Scholar Back to article
Furey, T. S., N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, D. Haussler. Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. – Bioinformatics, Vol 16, 2000, pp. 906-914.
Search in Google Scholar Back to article
Khan, J., J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, P. S. Meltzer. Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and an Artificial Neural Network. – Nature Medicine, Vol. 7, 2001, No 6, pp. 673-679.
Search in Google Scholar Back to article
Chen, W., H. Lu, M. Wang. Gene Expression Data Classification Using Artificial Neural Network Ensembles Based on Samples Filtering. – International Conference on Artificial Intelligence and Computational Intelligence, Shanghai, China, 2009, pp. 626-628.
Search in Google Scholar Back to article
Vanitha, C. D. A., D. Devaraj, M. Venkatesulu. Gene Expression Data Classification Using Support Vector Machine and Mutual Information-Based Gene Selection. – Procedia Computer Science, Vol. 47, 2015, pp. 13-21.
Search in Google Scholar Back to article
Fan, L., K. L. Poh, P. Zhou. A Sequential Feature Extraction Approach for Naïve Bayes Classification of Microarray Data. – Expert Systems with Applications, Vol. 36, 2009, pp. 9919-9923.
Search in Google Scholar Back to article
Fan, L., K. L. A Comparative Study of PCA, ICA, and Class-Conditional ICA for Naïve Bayes Classifier. – In: F. Sandoval, A. Prieto, J. Cabestany, M. Graña, Eds. Conference: Computational and Ambient Intelligence, Computational and Ambient Intelligence (IWANN), Lecture Notes in Computer Science. Vol. 4507. 2007, Berlin, Heidelberg, Springer, Poh. pp. 16-22. ISBN: 978-3-540-73006-4.
Search in Google Scholar Back to article
Maulik, U., A. Mukhopadhyay, S. Bandyopadhyay. Combining Pareto-Optimal Clusters Using Supervised Learning for Identifying Co-Expressed Genes. – BMC Bioinformatics, Vol. 10, 2009, pp. 1-16.
Search in Google Scholar Back to article
Mukhopadhyay, A., S. Bandyopadhyay, U. Maulik. Multi-Class Clustering of Cancer Subtypes through SVM-Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification. – PLoS One, Vol. 5, 2010, pp. 1-14.
Search in Google Scholar Back to article
Bhuvaneswari, V., K. Vanitha. Classification of Microarray Gene Expression Data by Gene Combinations Using Fuzzy Logic (MGC-FL). – International Journal of Computer Science Engineering and Application, Vol. 2, 2012, pp. 79-98.
Search in Google Scholar Back to article
Cilia, N. D., D. Stefano, C. F. Fontanella, S. Raimondo, A. Cotto. An Experimental Comparison of Feature-Selection and Classification Methods for Microarray Datasets. – Information, Vol. 10, 2019, No 3, pp. 1-13. DOI: 10.3390/info10030109.
Search in Google Scholar Back to article
Lee, J., I. Choi, C. H. Jun. An Efficient Multivariate Feature Ranking Method for Gene Selection in High-Dimensional Microarray Data. – Expert Systems with Applications, Vol. 166, 2021, pp. 1-9.
Search in Google Scholar Back to article
Helmy, M., R. Agrawal, J. Ali, M. Soudy, T. T. Bui, K. Selvarajoo. GeneCloudOmics: A Data Analytic Cloud Platform for High-Throughput Gene Expression Analysis. – Frontiers in Bioinformatics, Vol. 1, 2021, pp. 1-14.
Search in Google Scholar Back to article
Widiharto, M., A. Soeleman, A. Syukur. Performance Improvement of Naïve Bayes Algorithm Based on Information Gain and Forward Selection Features Selection for Heart Disease Classification. – IOSR Journal of Computer Engineering, Vol. 24, 2022, No 3, pp. 69-79.
Search in Google Scholar Back to article
Wahid, A., M. T. Banday. Classification of DNA Microarray Gene Expression Leukemia Data through the ABC and CNN Methods. – International Journal of Intelligent Systems and Application in Engineering, Vol. 11, 2023, No 75, pp. 119-131.
Search in Google Scholar Back to article
Majumder, D. Application of Information Theory for Understanding of HLA Gene Regulation in Leukemia. – In: Advances in Computing & Information Technology, Advances in Intelligent Systems and Computing. Vol. 177. Berlin, Heidelberg, Springer, 2013, pp.161-173. ISBN: 978-3-642-31551-0.
Search in Google Scholar Back to article
Das, B., D. Majumder. Maximum Entropy-Based Multivariate Dependence Analysis with a Case Study for HLA Gene Regulatory Network in Human Leukemia. – International Journal of Information Engineering, Vol. 3, 2013, No 4, pp. 137-142.
Search in Google Scholar Back to article
Das, B., D. Majumder. Differences of HLA Gene Regulatory Network in Human Myeloid and Lymphoid Leukemias. – In: Proc. of International Conference on Bioinformatics and Systems Biology, 2018, pp. 165-169. DOI: 10.1109/BSB.2018.8770568.
Search in Google Scholar Back to article
Jetka, T., K. Nienaltowski, S. Filippi, M. P. H. Stumpf, M. Komorowski. An Information-Theoretic Framework for Deciphering Pleiotropic and Noisy Biochemical Signaling. – Nature Communications, Vol. 9, 2018, No 4591, pp. 1-9.
Search in Google Scholar Back to article
Martino, A. D., D. Martino. An Introduction to the Maximum Entropy Approach and Its Application to Inference Problems in Biology. – Heliyon, Vol. 4, 2018, No 4, pp. 1-33.
Search in Google Scholar Back to article
Conforte, A. J., J. A. Tuszynski, F. D. Barbosa, N. Carels. Signaling Complexity Measured by Shannon Entropy and Its Application in Personalized Medicine. – Frontiers in Genetics, Vol. 10, 2019, pp. 1-14.
Search in Google Scholar Back to article
Karolak, A., S. Branciamore, J. S. McCune, P. P. Lee. Concepts and Applications of Information Theory to Immune-Oncology. – Trends in Cancer, Vol. 7, 2021, No 4, pp. 335-346.
Search in Google Scholar Back to article
Billing, U., T. Jetka, L. Nortmann, N. Wundrack, M. Komorowski, S. Waldherr, F. Schaper, A. Dittrich. Robustness and Information Transfer within IL-6-Induced JAK/STAT Signaling. – Communications Biology, Vol. 2, 2019, No 27, pp. 1-14.
Search in Google Scholar Back to article
Dixit, P. D., E. Lyashenko, M. Niepel, D. Vitkup. Maximum Entropy Framework for Predictive Inference of Cell Population Heterogeneity and Responses in Signaling Networks. – Cell Systems, Vol. 10, 2020, No 2, pp. 204-212.
Search in Google Scholar Back to article
Guo, Z., Y. Fu, C. Huang, C. Zheng, Z. Wu, X. Chen, S. Gao, Y. Ma, M. Shahen, Y. Li, P. Tu, J. Zhu, Z. Wang, W. Xiao, Y. Wang. NOGEA: A Network-Oriented Gene Entropy Approach for Dissecting Disease Comorbidity and Drug Respositioning. – Bioinformatics, Vol. 19, 2021, No 4, pp. 549-564.
Search in Google Scholar Back to article
Ameri, A. J., Z. A. Lewis. Shannon Entropy as a Metric for Conditional Gene Expression in Neurospora Crassa. – G3 Genes| Genomes| Genetics, Vol. 11, 2021, No 4, pp. 1-7.
Search in Google Scholar Back to article
Das, M., D. Majumder. Development of an Algorithm for Gene Expression Analysis through MaxEnt-Based Multivariate Information Theory. – In: International Conference on Intelligent Communication and Computational Techniques (ICCT’17), New York, New Jersey, IEEE, 2017, pp. 217-222. DOI: 10.1109/INTELCCT.2017.8324048.
Search in Google Scholar Back to article
Greven, A., G. Keller, G. Warnecke. Entropy. Princeton, NJ, USA, Princeton University Press, 2014, 384 p.
Search in Google Scholar Back to article
Demirel, Y., V. Gerbaud. Nonequilibrium Thermodynamics: Transport and Rate Processes in Physical, Chemical, and Biological Systems. – Amsterdam, The Netherlands, Elsevier, 2019.
Search in Google Scholar Back to article
Jakimowicz, A. The Role of Entropy in the Development of Economics. – Entropy, Vol. 22, 2020, No 4, p. 452. DOI: 10.3390/e22040452.
Search in Google Scholar Back to article
Rostaghi, M., H. Azam. Dispersion Entropy: A Measure for Time-Series Analysis. – IEEE Signal Processing Letters, Vol. 23, 2016, pp. 610-614.
Search in Google Scholar Back to article
Reynar, J. C., A. Ratnaparkhi. A Maximum Entropy Approach to Identifying Sentence Boundaries. – In: Proc. of 5th Conference on Applied Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 1997, pp. 16-19.
Search in Google Scholar Back to article
Shannon, C. E. A Mathematical Theory of Communication. – The Bell System Technical Journal, Vol. 27, 1948, pp. 379-423.
Search in Google Scholar Back to article
Petrov, I. I. Information Systems Reliability in Traditional Entropy and Novel Hierarchy. – Cybernetics and Information Technologies, Vol. 22, 2022, No 3, pp. 1-15.
Search in Google Scholar Back to article
Majumder, D. HLA Expression in Leukemia: Status, Regulation & Therapeutic Implications of HLA Expression in Leukemia.– USA & UK: LAMBERT Academic Publishing GmbH & Co., Canada, India, Germany, 2012. ISBN: 978-3-8484-3247-9.
Search in Google Scholar Back to article
Gibbs, J. W. Elementary Principles in Statistical Mechanics. – New York, Dover Publications, 1960 (Reprint of 1902). ISBN: 10: 0486607070.
Search in Google Scholar Back to article
Das, B., D. Majumder. Information Theory-Based Analysis for Understanding the Regulation of HLA Gene Expression in Human Leukemia. – International Journal of Information Sciences and Techniques, Vol. 2, 2012, No 5, pp. 39-50.
Search in Google Scholar Back to article
Bansall, M., V. Belcastro, A. A. Impiombato, D. D. Bernardo. How to Infer Gene Networks from Expression Profiles. – Molecular Systems Biology, EMBO, Vol. 3, 2007, No 78, pp. 1-10.
Search in Google Scholar Back to article
Teschendorff, A. E., S. Severini. Increased Entropy of Signal Transduction in the Cancer Metastasis Phenotype. – BMC Systems Biology, Vol. 4, 2010, No 1, 104.
Search in Google Scholar Back to article
Majumder, D., A. Mukherjee. A Passage through Systems Biology to Systems Medicine: Adoption of Middle-Out Rational Approaches towards the Understanding of Clinical Outcome in Cancer Therapy. – Analyst, Vol. 136, 2011, pp. 663-678.
Search in Google Scholar Back to article
Majumder, D., A. Mukherjee. Multi-Scale Modeling Approaches in Systems Biology Towards the Assessment of Cancer Treatment Dynamics: Adoption of Middle-out Rationalist Approach. – In: Advances in Cancer: Research & Treatment, 2013, Article ID 587889.
Search in Google Scholar Back to article
Wieringen, V., V. D. Vaart. Statistical Analysis of the Cancer Cell’s Molecular Entropy Using High-Throughput Data. – Bioinformatics, Vol. 27, 2011, No 4, pp. 556-563.
Search in Google Scholar Back to article
Margolin, A. A., K. Wang, A. Califano, I. Nemenman. Multivariate Dependence and Genetic Networks Inference, – IET Systems Biology, Vol. 4, No 6, 2010, pp. 428-440.
Search in Google Scholar Back to article
GNU Octave Wiki (Assessed on 06.05.2024). https://wiki.octave.org/Publications_using_Octave,
Search in Google Scholar Back to article
Prinz, H. Numerical Methods for the Life Scientist: Binding and Enzyme Kinetics Calculated with GNU Octave and MATLAB. Springer, Heidelberg, Dordrecht, London, New York, 2011.
Search in Google Scholar Back to article
Ranjan, M. K., K. Barot, V. Khairnar, V. Rawal, A. Pimpalgaonkar, S. Saxena, A. M. Sattar. Python: Empowering Data Science Applications and Research. – Journal of Operating Systems Development & Trends, Vol. 10, 2023, No 1, pp. 27-33.
Search in Google Scholar Back to article
Singh, P., A. E. Oke, A. F. Kineber, O. I. Olanrewaju, O. Omole, M. S. Samsurijan, R. A. Ramli. A Mathematical Analysis of 4IR Innovation Barriers in Developmental Social Work – A Structural Equation Modeling Approach. – In: Article in Mathematics, Vol. 11, 2023, No 1003, pp. 1-20.
Search in Google Scholar Back to article
West, J., G. Bianconi, S. Severini, A. E. Teschendorff. Differential Network Entropy Reveals Cancer System Hallmarks. – Scientific Reports. Vol. 2, 2012, No 1, p.802.
Search in Google Scholar Back to article
Barnes, N. Publish Your Computer Code: It Is Good Enough. – Nature, Vol. 467, 2010, No 7317, 753.
Search in Google Scholar Back to article
Roberts, M., D. Driggs, M. Thorpe et al. Common Pitfalls and Recommendations for Using Machine Learning to Detect and Prognosticate for COVID-19 Using Chest Radiographs and CT Scans. – Nature Machine Intelligence, Vol. 3, 2021, pp. 199-217.
Search in Google Scholar Back to article

Inferring Biomedical Networks Using Multivariate Information Theory: Open-Source Code and Tutorial

Abstract

Paradigm

My account