Have a personal or library account? Click to login
Code generation approaches for parallel geometric multigrid solvers Cover

References

  1. [1] M. Adams, P. Colella, D. T. Graves, J. N. Johnson, Keen, N. D., T. J. Ligocki, D. F. Martin, P. W. McCorquodale, D. Modiano, P. Schwartz, T. Sternberg, and B. van Straalen. Chombo software package for AMR applications - design document. Technical Report LBNL-6616E, Lawrence Berkeley National Laboratory, Jan 2015.
  2. [2] S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing, pages 163–202. Birkhäuser Press, 1997.10.1007/978-1-4612-1986-6_8
  3. [3] W. Bangerth, R. Hartmann, and G. Kanschat. deal.II – a general purpose object oriented finite element library. ACM Trans. Math. Softw., 33(4):24/1–24/27, 2007.10.1145/1268776.1268779
  4. [4] P. Bastian, C. Engwer, D. Göddeke, O. Iliev, O. Ippisch, M. Ohlberger, S. Turek, J. Fahlke, S. Kaulmann, S. Müthing, and D. Ribbrock. EXA-DUNE: Flexible pde solvers, numerical methods and applications. In Euro-Par 2014: Parallel Processing Workshops, volume 8806 of Lecture Notes in Computer Science, pages 530–541. Springer, 2014.10.1007/978-3-319-14313-2_45
  5. [5] M. Bauer, F. Schornbaum, C. Godenschwager, M. Markl, D. Anderl, H. Köstler, and U. Rüde. A python extension for the massively parallel multiphysics simulation framework walberla. International Journal of Parallel, Emergent and Distributed Systems, 31(6):529–542, 2016.10.1080/17445760.2015.1118478
  6. [6] B. Bergen, T. Gradl, F. Hülsemann, and U. Rüde. A massively parallel multigrid method for finite elements. Computing in Science and Engineering, 8(6):56–62, 2006.10.1109/MCSE.2006.102
  7. [7] B. Bergen and F. Hülsemann. Hierarchical hybrid grids: data structures and core algorithms for multigrid. Numer. Linear Algebra Appl., 11:279–291, 2004.10.1002/nla.382
  8. [8] M. Blatt, A. Burchardt, A. Dedner, C. Engwer, J. Fahlke, B. Flemisch, C. Gersbacher, C. Grüser, F. Gruber, C. Gräninger, D. Kempf, R. Klöfkorn, T. Malkmus, S. Müthing, M. Nolte, M. Piatkowski, and O. Sander. The distributed and unified numerics environment, version 2.4. Archive of Numerical Software, 4(100):13–29, 2016.
  9. [9] M. Bolten, F. Franchetti, P. H. J. Kelly, C. Lengauer, and M. Mohr. Algebraic description and automatic generation of multigrid methods in SPIRAL. Concurrency and Computation: Practice and Experience, 29(17):4105:1–4105:11, 2017. Special Issue on Advanced Stencil-Code Engineering.10.1002/cpe.4105
  10. [10] T. Brandvik and G. Pullan. SBLOCK: A framework for efficient stencil-based PDE solvers on multi-core platforms. In 2010 10th IEEE International Conference on Computer and Information Technology, pages 1181–1188, Jun 2010.10.1109/CIT.2010.214
  11. [11] M. Christen, O. Schenk, and H. Burkhart. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In 2011 IEEE International Parallel Distributed Processing Symposium, pages 676–687, May 2011.10.1109/IPDPS.2011.70
  12. [12] C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, F. Cantonnet, T. El-Ghazawi, A. Mohanti, Y. Yao, and D. Chavarría-Miranda. An evaluation of global address space languages: Co-array Fortran and unified parallel C. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ‘05, pages 36–47, New York, NY, USA, 2005. ACM.10.1145/1065944.1065950
  13. [13] Z. DeVito, N. Joubert, F. Palaciosy, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: A domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 1–12. ACM, 2011.10.1145/2063384.2063396
  14. [14] H. C. Edwards, C. R. Trott, and D. Sunderland. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Parallel and Distributed Computing, 74(12):3202 – 3216, 2014. Special issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.10.1016/j.jpdc.2014.07.003
  15. [15] R. D. Falgout, J. E. Jones, and U. M. Yang. The design and implementation of hypre, a library of parallel high performance preconditioners. In Numerical Solution of Partial Differential Equations on Parallel Computers, pages 267–294, Berlin, Heidelberg, 2006. Springer.10.1007/3-540-31619-1_8
  16. [16] M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. SIGPLAN Not., 33(5):212–223, May 1998.10.1145/277652.277725
  17. [17] K. Fürlinger, C. Glass, A. Knüpfer, J. Tao, D. Hünich, K. Idrees, M. Maiterth, Y. Mhedheb, and H. Zhou. DASH: Data structures and algorithms with support for hierarchical locality. In Euro-Par 2014 Workshops (Porto, Portugal), pages 542–552, 2014.10.1007/978-3-319-14313-2_46
  18. [18] B. Gmeiner, T. Gradl, H. Köstler, and U. Rüde. Highly parallel geometric multigrid algorithm for hierarchical hybrid grids. In K. Binder, G. Münster, and M. Kremer, editors, NIC Symposium 2012, volume 45 of Publication series of the John von Neumann Institute for Computing, pages 323–330, Jülich, Germany, 2012.
  19. [19] B. Gmeiner, M. Huber, L. John, U. Rüde, and B. Wohlmuth. A quantitative performance study for Stokes solvers at the extreme scale. J. Comput. Sci., 17(3):509–521, 2016.10.1016/j.jocs.2016.06.006
  20. [20] B. Gmeiner, H. Köstler, M. Stürmer, and U. Rüde. Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrency and Computation: Practice and Experience, 26(1):217–240, 2014.10.1002/cpe.2968
  21. [21] B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth. Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems. SIAM J. Sci. Comput., 37(2):C143–C168, 2015.10.1137/130941353
  22. [22] B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth. Towards textbook efficiency for parallel multigrid. Numer. Math. Theory Methods Appl., 8:2246, 2015.10.4208/nmtma.2015.w10si
  23. [23] T. Gysi, T. Grosser, and T. Hoefler. MODESTO: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ‘15, pages 177–186, New York, NY, USA, 2015. ACM.10.1145/2751205.2751223
  24. [24] T. Gysi, C. Osuna, O. Fuhrer, M. Bianco, and T. C. Schulthess. STELLA: A domain-specific tool for structured grid methods in weather and climate models. In Proceedings International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pages 41:1–41:12. ACM, Nov 2015.10.1145/2807591.2807627
  25. [25] M. Heisig. Petalisp: A common lisp library for data parallel programming. In 11th European Lisp Symposium, page 4, 2018.
  26. [26] M. Heisig and H. Köstler. Petalisp: run time code generation for operations on strided arrays. In Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, pages 11–17. ACM, 2018.10.1145/3219753.3219755
  27. [27] M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the Trilinos project. ACM Trans. Math. Softw., 31(3):397–423, 2005.10.1145/1089014.1089021
  28. [28] L. V. Kale and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. SIGPLAN Notices, 28(10):91–108, Oct 1993.10.1145/167962.165874
  29. [29] N. Kohl, D. Thönnes, D. Drzisga, D. Bartuschat, and U. Rüde. The hyteg finite-element software framework for scalable multigrid solvers. International Journal of Parallel, Emergent and Distributed Systems, 0(0):1–20, 2018.
  30. [30] H. Köstler, C. Schmitt, S. Kuckuk, F. Hannig, J. Teich, and U. Rüde. A scala prototype to generate multigrid solver implementations for different problems and target multi-core platforms. Int. J. of Computational Science and Engineering, 14(2):150–163, 2017.10.1504/IJCSE.2017.082879
  31. [31] H. Köstler, M. Stürmer, and T. Pohl. Performance engineering to achieve real-time high dynamic range imaging. Journal of Real-Time Image Processing, pages 1–13, 2013.10.1007/s11554-012-0312-3
  32. [32] S. Kronawitter, S. Kuckuk, H. Köstler, and C. Lengauer. Automatic data layout transformations in the exastencils code generator. Modern Physics Letters A, 28(03):1850009, 2018.10.1142/S0129626418500093
  33. [33] S. Kronawitter, S. Kuckuk, H. Köstler, and C. Lengauer. Automatic data layout transformations in the ExaStencils code generator. Parallel Processing Letters, 28(03):1850009, 2018.10.1142/S0129626418500093
  34. [34] S. Kronawitter, S. Kuckuk, and C. Lengauer. Redundancy elimination in the ExaStencils code generator. In Algorithms and Architectures for Parallel Processing, pages 159–173, Cham, 2016. Springer International Publishing.10.1007/978-3-319-49956-7_13
  35. [35] S. Kuckuk, G. Haase, D. A. Vasco, and H. Köstler. Towards generating efficient flow solvers with the ExaStencils approach. Concurrency and Computation: Practice and Experience, 29(17):4062:1–4062:17, 2017. Special Issue on Advanced Stencil-Code Engineering.10.1002/cpe.4062
  36. [36] S. Kuckuk and H. Köstler. Automatic generation of massively parallel codes from ExaSlang. Computation, 4(3):27:1–27:20, 2016. Special Issue on High Performance Computing (HPC) Software Design.10.3390/computation4030027
  37. [37] S. Kuckuk and H. Köstler. Whole program generation of massively parallel shallow water equation solvers. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pages 78–87, Sept 2018.10.1109/CLUSTER.2018.00020
  38. [38] S. Kuckuk and H. Kstler. Automatic generation of massively parallel codes from exaslang. Computation, 4(3):27:1–27:20, 2016. Special Issue on High Performance Computing (HPC) Software Design.10.3390/computation4030027
  39. [39] S. Kuckuk, L. Leitenmaier, C. Schmitt, D. Schönwetter, H. Köstler, and D. Fey. Towards virtual hardware prototyping for generated geometric multigrid solvers. Technical Report CS 2017-01, Technische Fakultät, 2017.
  40. [40] C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, et al. Exastencils: Advanced stencil-code engineering. In European Conference on Parallel Processing, pages 553–564. Springer, 2014.10.1007/978-3-319-14313-2_47
  41. [41] C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, S. Kuckuk, H. Rittich, and C. Schmitt. ExaStencils: Advanced stencil-code engineering. In L. Lopes et al., editors, Euro-Par 2014: Parallel Processing Workshops, volume 8806 of Lecture Notes in Computer Science (LNCS), pages 553–564. Springer, 2014.10.1007/978-3-319-14313-2_47
  42. [42] A. Logg, K.-A. Mardal, and G. N. Wells. Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering (LNCSE). Springer, 2012.10.1007/978-3-642-23099-8
  43. [43] N. Maruyama, K. Sato, T. Nomura, and S. Matsuoka. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In SC ‘11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, Nov 2011.10.1145/2063384.2063398
  44. [44] G. R. Mudalige, I. Reguly, M. B. Giles, C. Bertolli, and P. H. J. Kelly. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In Proc. Innovative Parallel Computing (InPar), San Jose, California, May 2012. IEEE.10.1109/InPar.2012.6339594
  45. [45] G. Ofenbeck, T. Rompf, and M. Püschel. Staging for generic programming in space and time. SIGPLAN Not., 52(12):15–28, Oct 2017.10.1145/3170492.3136060
  46. [46] M. Püschel, F. Franchetti, and Y. Voronenko. Spiral, volume 4, pages 1920–1933. Springer, 2011.
  47. [47] F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. Mcrae, G.-T. Bercea, G. R. Markall, and P. H. J. Kelly. Firedrake: Automating the finite element method by composing abstractions. ACM Trans. on Mathematical Software (TOMS), 43(3):24:1–24:27, 2016.10.1145/2998441
  48. [48] P. Rawat, M. Kong, T. Henretty, J. Holewinski, K. Stock, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. SDSLc: A multi-target domain-specific compiler for stencil computations. In Proc. 5th Int’l Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 6:1–6:10. ACM, Nov 2015.10.1145/2830018.2830025
  49. [49] C. Schmitt, S. Kuckuk, F. Hannig, H. Köstler, and J. Teich. Exa-Slang: A domain-specific language for highly scalable multigrid solvers. In Proc. 4th Int’l Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 42–51. IEEE Computer Society, Nov. 2014.10.1109/WOLFHPC.2014.11
  50. [50] C. Schmitt, M. Schmid, F. Hannig, J. Teich, S. Kuckuk, and H. Köstler. Generation of multigrid-based numerical solvers for FPGA accelerators. In Proc. 2nd Int’l Workshop on High-Performance Stencil Computations (HiStencils), pages 9–15, Jan. 2015.
  51. [51] C. Schmitt, M. Schmid, S. Kuckuk, H. Köstler, J. Teich, and F. Hannig. Reconfigurable hardware generation of multigrid solvers with conjugate gradient coarse-grid solution. Parallel Processing Letters, 28(04):1850016, 2018.10.1142/S0129626418500160
  52. [52] J. Schmitt, H. Köstler, J. Eitzinger, and R. Membarth. Unified code generation for the parallel computation of pairwise interactions using partial evaluation. In 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC), pages 17–24. IEEE, 2018.10.1109/ISPDC2018.2018.00012
  53. [53] Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The Pochoir stencil compiler. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 117–128. ACM, 2011.10.1145/1989493.1989508
  54. [54] U. Trottenberg, C. Oosterlee, and A. Schüller. Multigrid. Academic Press, San Diego, CA, USA, 2001.
  55. [55] A. Vogel, S. Reiter, M. Rupp, A. Nägel, and G. Wittum. UG 4: A novel flexible software system for simulating pde based models on high performance computers. Computing and Visualization in Science, 16(4):165–179, 2013.10.1007/s00791-014-0232-9
  56. [56] T. Weinzierl. The peano softwareparallel, automaton-based, dynamically adaptive grid traversals. ACM Transactions on Mathematical Software (TOMS), 45(2):14, 2019.10.1145/3319797
DOI: https://doi.org/10.2478/auom-2020-0038 | Journal eISSN: 1844-0835 | Journal ISSN: 1224-1784
Language: English
Page range: 123 - 152
Submitted on: Jul 10, 2019
Accepted on: Dec 16, 2019
Published on: Dec 28, 2020
Published by: Ovidius University of Constanta
In partnership with: Paradigm Publishing Services
Publication frequency: 3 issues per year

© 2020 Harald Köstler, Marco Heisig, Nils Kohl, Sebastian Kuckuk, Martin Bauer, Ulrich Rüde, published by Ovidius University of Constanta
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.