Have a personal or library account? Click to login
Jug: Software for Parallel Reproducible Computation in Python Cover

Jug: Software for Parallel Reproducible Computation in Python

Open Access
|Oct 2017

References

  1. Altintas, I, Berkley, C, Jaeger, E, Jones, M, Ludascher, B and Mock, S 2004 “Kepler: an extensible system for design and execution of scientific workflows”. In: Scientific and Statistical Database Management, Proceedings. 16th International Conference on (Apr. 2004) URL: https://scholar.google.com/scholar?cluster=17284613261601846997 (cit. on p.).
  2. Augustin, S and Müller, C 2013 “Interference effects in Bethe-Heitler pair creation in a bichromatic laser field”. Physical Review A, 88(2): 022109. ISSN: 2469-9934. (cit. on p.). DOI: 10.1103/PhysRevA.88.022109
  3. Baumer, B, Cetinkaya-Rundel, M, Bray, A, Loi, L and Horton, N J 2014 “R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics”. Technological Innovations in Statistics Education, 8. (cit. on p.).
  4. Beazley, D M “Automated scientific software scripting with SWIG”. In: Future Generation Computer Systems, 19 (Mar. 2003). URL: https://scholar.google.com/scholar?cluster=14166776132178739884 (cit. on p.). DOI: 10.1016/S0167-739X(02)00171-1
  5. Beazley, D M 1996 “SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++.” In: Tcl/Tk Workshop. URL: https://scholar.google.com/scholar?cluster=2768773569829356266 (cit. on p.).
  6. Behnel, S, Bradshaw, R, Citro, C, Dalcin, L, Seljebotn, D S and Smith, K “Cython: The Best of Both Worlds”. In: Computing in Science & Engineering, 13(2): 3139. (Nov. 2011). ISSN: 1521-9615. (cit. on p.). DOI: 10.1109/MCSE.2010.118
  7. Boettiger, C 2015 “An introduction to Docker for reproducible research”. In: ACM SIGOPS Operating Systems Review, 49(1): 7179. ISSN: 0163-5980.(cit. on p.). DOI: 10.1145/2723872.2723882
  8. Cingolani, P, Sladek, R and Blanchette, M 2015 “BigDataScript: a scripting language for data pipelines”. In: Bioinformatics, 31(1): 1016. ISSN: 1367-4803. (cit. on p.). DOI: 10.1093/bioinformatics/btu595
  9. Coelho, L P 2013 “Mahotas: Open source software for scriptable computer vision”. In: Journal of Open Research Software 1(1): e3. ISSN: 2049-9647. (cit. on p.) DOI: 10.5334/jors.ac
  10. Coelho, L P, Kangas, J D, Naik, A W, Osuna-Highley, E, Glory-Afshar, E, Fuhrman, M, Simha, R, Berget, P B, Jarvik, J W and Murphy, R F 2013 “Determining the subcellular location of new proteins from microscope images using local features.” In: Bioinformatics (Oxford, England), 29(18): 23439. ISSN: 1367-4803. (cit. on p.). DOI: 10.1093/bioinformatics/btt392
  11. Coelho, L P, Pato, C, Friães, A, Neumann, A, von Köckritz-Blickwede, M, Ramirez, M and Carriço, J A 2015 “Automatic determination of NET (neutrophil extracellular traps) coverage in fluorescent microscopy images.” In: Bioinformatics (Oxford, England), 31(14): 236470. ISSN: 1367-4803. (cit. on p.). DOI: 10.1093/bioinformatics/btv156
  12. Coelho, L P, Peng, T and Murphy, R F “Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixing”. In: Bioinformatics, 26(12): i7i12. (Oct. 2010). ISSN: 1367-4803. (cit. on p.). DOI: 10.1093/bioinformatics/btq220
  13. Dask Development Team 2016 Dask: Library for dynamic task scheduling. URL: http://dask.pydata.org (cit. on p.).
  14. Davison, A “Automated Capture of Experiment Context for Easier Reproducibility in Computational Research”. In: Computing in Science & Engineering 14(4): 4856. (Dec. 2012). ISSN: 1521-9615. (cit. on p.). DOI: 10.1109/MCSE.2012.41
  15. Dean, J and Ghemawat, S “MapReduce: simplified data processing on large clusters”. In: Commun. ACM, 51(1): 107113. (Aug. 2008). ISSN: 0001-0782. (cit. on p.). DOI: 10.1145/1327452.1327492
  16. Delescluse, M, Franconville, R, Joucla, S, Lieury, T and Pouzat, C “Making neurophysiological data analysis reproducible: Why and how?” In: Journal of Physiology-Paris, (Nov. 2011). ISSN: 0928-4257. URL: http://www.sciencedirect.com/science/article/pii/S0928425711000374 (cit. on p.)
  17. Devresse, A, Delalondre, F and Schürmann, F 2015 “Nix Based Fully Automated Workflows and Ecosystem to Guarantee Scientific Result Reproducibility Across Software Environments and Systems”. In: Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering, 2531. SE-HPCCSE ′15. Austin, Texas: ACM, ISBN: 978-1-4503-4012-0. (cit. on p.) DOI: 10.1145/2830168.2830172
  18. Donoho, D L, Maleki, A, Rahman, I Ur, Shahram, M and Stodden, V “Reproducible Research in Computational Harmonic Analysis”. In: Computing in Science & Engineering, 11(1): 818. (Sept. 2009). ISSN: 1521-9615. (cit. on p.). DOI: 10.1109/MCSE.2009.15
  19. Dudley, J T and Butte, A J “Reproducible in silico research in the era of cloud computing”. In: Nature biotechnology, 28. (Oct. 2010). URL: https://scholar.google.com/scholar?cluster=14329535853377349322 (cit. on p.). DOI: 10.1038/nbt1110-1181
  20. Feulner, G 2016 Reproducibility: Principles, Problems, Practices, and Prospects: Principles, Problems, Practices, and Prospects, 269285. (cit. on p.). DOI: 10.1002/9781118865064.ch12
  21. Fomel, S 2015 “Reproducible Research as a Community Effort: Lessons from the Madagascar Project”. In: Computing in Science & Engineering, 17(1): 2026. ISSN: 1521-9615. (cit. on p.). DOI: 10.1109/MCSE.2014.94
  22. Fomel, S and Hennenfent, G “Reproducible Computational Experiments using Scons”. (July 2007). (cit. on p.). DOI: 10.1109/ICASSP.2007.367305
  23. Goble, C 2014 “Better Software, Better Research”. In: IEEE Internet Computing, 18(5): 48. ISSN: 1089-7801. (cit. on p.). DOI: 10.1109/MIC.2014.88
  24. Goecks, J, Nekrutenko, A, Taylor, J and The Galaxy Team “Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences”. In: Genome Biology, 11(8): 113. (Oct. 2010). ISSN: 1474-760X. (cit. on p.). DOI: 10.1186/gb-2010-11-8-r86
  25. Goodstadt, L “Ruffus: A Lightweight Python Library for Computational Pipelines”. In: Bioinformatics (Oct. 2010). URL: http://bioinformatics.oxfordjournals.org/content/early/2010/09/16/bioinformatics.btq524.abstract (cit. p.).
  26. Guo, P J and Engler, D 2010 “Towards Practical Incremental Recomputation for Scientists: An Implementation for the Python Language”. In: Proceedings of the 2nd Conference on Theory and Practice of Provenance, 66. TAPP’10. San Jose, California: USENIX Association. URL: http://dl.acm.org/citation.cfm?id=1855795.1855801 (cit. on p.).
  27. Guo, P J and Engler, D Using automatic persistent memoization to facilitate data analysis scripting 287297. Nov. 2011. (cit. on p.). DOI: 10.1145/2001420.2001455
  28. Hannay, J E, MacLeod, C, Singer, J, Langtangen, H P, Pfahl, D and Wilson, G “How do scientists develop and use scientific software?” (Sept. 2009), 18. (cit. on p.). DOI: 10.1109/SECSE.2009.5069155
  29. Haralick, R M, Shanmugam, K and Dinstein, I 1973 “Textural Features for Image Classification.” In: IEEE Transactions on Systems, Man, and Cybernetics, 3(6): 610621. ISSN: 0018-9472. (cit. on p.). DOI: 10.1109/TSMC.1973.4309314
  30. Hensman, J, Matthews, A, Filippone, M and Ghahramani, Z 2015 “MCMC for Variationally Sparse Gaussian Processes”. In: (cit. on p.).
  31. Hull, D, Wolstencroft, K, Stevens, R, Goble, C, Pocock, M R, Li, P and Oinn, T “Taverna: a tool for building and running workflows of services”. In: Nucleic Acids Research, 34(suppl2): W729W732. (June 2006). ISSN: 0305-1048. (cit. on p.). DOI: 10.1093/nar/gkl320
  32. Ilyas, A 2014 “MicroFilters: Harnessing twitter for disaster management”. 417424. (cit. on p.). DOI: 10.1109/GHTC.2014.6970316
  33. Kluyver, T, Ragan-Kelley, B, Pérez, F, Granger, B E, Bussonnier, M, Frederic, J, Kelley, K, Hamrick, J B, Grout, J, Corlay, S, Ivanov, P, Avila, D, Abdalla, S, Willing, C, et al. 2016 “Jupyter Notebooks – a publishing format for reproducible computational workflows”. In: Positioning and Power in Academic Publishing: Players, Agents and Agendas, 20th International Conference on Electronic Publishing, 8790. Göttingen, Germany, June 7–9, (cit. on p.). DOI: 10.3233/978-1-61499-649-1-87
  34. Köster, J and Rahmann, S “Snakemake–a scalable bioinformatics workflow engine.” In: Bioinformatics (Oxford, England), 28(19): 25202. (Dec. 2012). URL: http://www.ncbi.nlm.nih.gov/pubmed/22908215 (cit. on p.).
  35. Leipzig, J 2016 “A review of bioinformatic pipeline frameworks”. In: Briefings in Bioinformatics, bbw020. ISSN: 1467-5463. (cit. on p.). DOI: 10.1093/bib/bbw020
  36. Leisch, F Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis, 575580. Feb. 2002. (cit. on p.). DOI: 10.1007/978-3-642-57489-4_89
  37. da Veiga Leprevost, F, Grüning, B A, Aflitos, S A, Röst, H L, Uszkoreit, J, Barsnes, H, Vaudel, M, Moreno, P, Gatto, L, Weber, J, Bai, M, Jimenez, R C, Sachsenberg, T, Pfeuffer, J, Alvarez, R V, Griss, J, Nesvizhskii, A I and Perez-Riverol, Y “BioContainers: An open-source and community-driven framework for software standardization”. In: Bioinformatics, 33(16): btx192, ISSN: 1460-2059. (cit. on p.). DOI: 10.1093/bioinformatics/btx192
  38. LeVeque, R J, Mitchell, I M and Stodden, V “Reproducible research for scientific computing: Tools and strategies for changing the culture”. In: Computing in Science & Engineering, 14(4): 1317. (Dec. 2012). ISSN: 1521-9615. (cit. on p.). DOI: 10.1109/MCSE.2012.38
  39. Ludäscher, B, Altintas, I, Berkley, C, Higgins, D, Jaeger, E, Jones, M, Lee, E A, Tao, T and Zhao, Y “Scientific workflow management and the Kepler system”. In: Concurrency and Computation: Practice and Experience, 18(10): 10391065. (June 2006). ISSN: 1532-0634. (cit. on p.). DOI: 10.1002/cpe.994
  40. Markowetz, F 2015 “Five selfish reasons to work reproducibly”. In: Genome Biology, 16(1): 274. ISSN: 1465-6906. (cit. on p.). DOI: 10.1186/s13059-015-0850-7
  41. Marwick, B 2016 “Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation”. In: Journal of Archaeological Method and Theory, 127. ISSN: 1072-5369. (cit. on p.). DOI: 10.1007/s10816-015-9272-9
  42. Mishima, H, Sasaki, K, Tanaka, M, Tatebe, O and Yoshiura, K-I “Agile parallel bioinformatics workflow management using Pwrake”. In: BMC Research Notes, 4(1): 331. (Nov. 2011). ISSN: 1756-0500. URL: https://www.biomedcentral.com/1756-0500/4/331 (cit. on p.).
  43. Moreno, A and Balch, T 2016 “Improving financial computation speed with full and subproblem memorization”. In: Concurrency and Computation: Practice and Experience, 28(3): 905915. ISSN: 1532-0634. (cit. on p.). DOI: 10.1002/cpe.3693
  44. Napolitano, F, Mariani-Costantini, R and Tagliaferri, R 2013 “Bioinformatic pipelines in Python with Leaf”. In: BMC Bioinformatics, 14(1): 114. ISSN: 1471-2105. (cit. on p.). DOI: 10.1186/1471-2105-14-201
  45. Nordlie, E, Gewaltig, M-O and Plesser, H E “Towards Reproducible Descriptions of Neuronal Network Models”. In: PLoS Comput Biol, 5(8): e1000456. (Sept. 2009). (cit. on p.). DOI: 10.1371/journal.pcbi.1000456
  46. Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, Blondel, M, Prettenhofer, P, Weiss, R, Dubourg, V, Vanderplas, J, Passos, A, Cournapeau, D, Brucher, M, Perrot, M and Duchesnay, E “Scikit-learn: Machine Learning in Python”. (Dec. 2012) (cit. on p.).
  47. Peng, R D and Eckel, S P “Distributed Reproducible Research Using Cached Computations”. In: Computing in Science & Engineering, 11(1): 2834. (Sept. 2009). ISSN: 1521-9615. (cit. on p.). DOI: 10.1109/MCSE.2009.6
  48. Perez, F and Granger, B E “IPython: A System for Interactive Scientific Computing”. In: Computing in Science & Engineering, 9(3): 2129. (July 2007). ISSN: 1521-9615. URL: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4160251 (cit. on p.).
  49. Prabhu, P, Zhang, Y, Ghosh, S, August, D I, Huang, J, Beard, S, Kim, H, Oh, T, Jablin, T B, Johnson, N P, Zoufaly, M, Raman, A, Liu, F and Walker, D A survey of the practice of computational science. 19. Nov. 2011. (cit. on p.). DOI: 10.1145/2063348.2063374
  50. Rampin, R, Chirigati, F, Shasha, D, Freire, J and Steeves, V “ReproZip: The Reproducibility Packer”. In: The Journal of Open Source Software, 1(8): (Dec. 2016). (cit. on p.). DOI: 10.21105/joss.00107
  51. Sadedin, S P, Pope, B and Oshlack, A “Bpipe: a tool for running and managing bioinformatics pipelines”. In: Bioinformatics, 28(11): 15251526. (Dec. 2012). ISSN: 1367-4803. (cit. on p.). DOI: 10.1093/bioinformatics/bts167
  52. Saul, A D, Hensman, J, Vehtari, A and Lawrence, N D 2016 “Chained Gaussian Processes”. In: BMC Bioinformatics, 14(1): 14311440. ISSN: 1471-2105. (cit. on p.). DOI: 10.1186/1471-2105-14-252
  53. Schwab, M, Karrenbach, M and Claerbout, J 2000 “Making scientific computations reproducible”. In: Computing in Science & Engineering, 2(6): 6167. ISSN: 1521-9615. (cit. on p.). DOI: 10.1109/5992.881708
  54. Severin, J, Beal, K, Vilella, A, Fitzgerald, S, Schuster, M, Gordon, L, Ureta-Vidal, A, Flicek, P and Herrero, J “eHive: An Artificial Intelligence workflow system for genomic analysis”. In: BMC Bioinformatics, 11(1): 240. ISSN: 1471-2105. (cit. on p.). URL: http://www.biomedcentral.com/1471-2105/11/240 (Oct. 2010).
  55. Sorge, A pyfssa 0.7.6. Dec. 2015. (cit. on p.). DOI: 10.5281/zenodo.35293
  56. Spjuth, O, Bongcam-Rudloff, E, Hernández, G C, Forer, L, Giovacchini, M, Guimera, R V, Kallio, A, Korpelainen, E, Kańduła, M M, Krachunov, M, Kreil, D P, Kulev, O, Łabaj, P P, Lampa, S, Pireddu, L, Schönherr, S, Siretskiy, A and Vassilev, D 2015 “Experiences with workflows for automating data-intensive bioinformatics”. In: Biology Direct, 10(1). ISSN: 1745-6150. (cit. on p.). DOI: 10.1186/s13062-015-0071-8
  57. Sunagawa, S, et al. 2015 “Structure and function of the global ocean microbiome”. In: Science, 348(6237): 1261359. ISSN: 0036-8075. (cit. on p.). DOI: 10.1126/science.1261359
  58. Taylor, I J, Deelman, E, Gannon, D B and Shields, M 2014 “Workflows for e-Science: scientific workflows for grids”. URL: https://scholar.google.com/scholar?cluster=704055550438545383 (cit. on p.).
  59. Vandewalle, P, Kovacevic, J and Vetterli, M “Reproducible research in signal processing”. In: Signal Processing Magazine, IEEE, 26(3): 3747. (Sept. 2009), ISSN: 1053-5888. (cit. on p.). DOI: 10.1109/MSP.2009.932122
  60. Vasilescu, B, Yu, Y, Wang, H, Devanbu, P and Filkov, V 2015 Quality and productivity outcomes relating to continuous integration in GitHub, 805816. (cit. on p.). DOI: 10.1145/2786805.2786850
  61. Vuollekoski, H, Vogt, M, Sinclair, V A, Duplissy, J, Järvinen, H, Kyrö, E-M, Makkonen, R, Petäjä, T, Prisle, N L, Räisänen, P, Sipilä, M, Ylhäisi, J and Kulmala, M 2015 “Estimates of global dew collection potential on artificial surfaces”. In: Hydrology and Earth System Sciences, 19(1): 601613. ISSN: 1027-5606. (cit. on p.). DOI: 10.5194/hess-19-601-2015
  62. Van Der Walt, S, Colbert, S C and Varoquaux, G “The NumPy array: a structure for efficient numerical computation”. In: Computing in Science & Engineering 13(2): 2230. (Nov. 2011). ISSN: 1521-9615. (cit. on p.). DOI: 10.1109/MCSE.2011.37
  63. Wilson, G, Aruliah, D A, Brown, C T, Hong, N P C, Davis, M, Guy, R T, Haddock, S H D, Huff, K D, Mitchell, I M, Plumbley, M D, Waugh, B, White, E P and Wilson, P 2014 “Best Practices for Scientific Computing” In: PLoS Biology, 12(1): e1001745. ISSN: 1544-9173. (cit. on p.). DOI: 10.1371/journal.pbio.1001745
  64. Xie, Y 2015 “Dynamic Documents with R and knitr”. 29 (2015). URL: https://scholar.google.com/scholar?cluster=1723118227528908006 (cit. on p.).
  65. Zaharia, M, Chowdhury, M, Franklin, M J, Shenker, S and Stoica, I “Spark: cluster computing with working sets.” In: HotCloud, 10 (Oct. 2010). https://scholar.google.com/scholar?cluster=14934743972440878947 (cit. on p.).
DOI: https://doi.org/10.5334/jors.161 | Journal eISSN: 2049-9647
Language: English
Submitted on: Jan 8, 2017
|
Accepted on: Oct 6, 2017
|
Published on: Oct 27, 2017
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2017 Luis Pedro Coelho, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.