Code generation approaches for parallel geometric multigrid solvers

Köstler, Harald; Heisig, Marco; Kohl, Nils; Kuckuk, Sebastian; Bauer, Martin; Rüde, Ulrich

doi:10.2478/auom-2020-0038

References

[1] M. Adams, P. Colella, D. T. Graves, J. N. Johnson, Keen, N. D., T. J. Ligocki, D. F. Martin, P. W. McCorquodale, D. Modiano, P. Schwartz, T. Sternberg, and B. van Straalen. Chombo software package for AMR applications - design document. Technical Report LBNL-6616E, Lawrence Berkeley National Laboratory, Jan 2015.
Search in Google Scholar Back to article
[2] S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing, pages 163–202. Birkhäuser Press, 1997.10.1007/978-1-4612-1986-6_8
Search in Google Scholar Back to article
[3] W. Bangerth, R. Hartmann, and G. Kanschat. deal.II – a general purpose object oriented finite element library. ACM Trans. Math. Softw., 33(4):24/1–24/27, 2007.10.1145/1268776.1268779
Search in Google Scholar Back to article
[4] P. Bastian, C. Engwer, D. Göddeke, O. Iliev, O. Ippisch, M. Ohlberger, S. Turek, J. Fahlke, S. Kaulmann, S. Müthing, and D. Ribbrock. EXA-DUNE: Flexible pde solvers, numerical methods and applications. In Euro-Par 2014: Parallel Processing Workshops, volume 8806 of Lecture Notes in Computer Science, pages 530–541. Springer, 2014.10.1007/978-3-319-14313-2_45
Search in Google Scholar Back to article
[5] M. Bauer, F. Schornbaum, C. Godenschwager, M. Markl, D. Anderl, H. Köstler, and U. Rüde. A python extension for the massively parallel multiphysics simulation framework walberla. International Journal of Parallel, Emergent and Distributed Systems, 31(6):529–542, 2016.10.1080/17445760.2015.1118478
Search in Google Scholar Back to article
[6] B. Bergen, T. Gradl, F. Hülsemann, and U. Rüde. A massively parallel multigrid method for finite elements. Computing in Science and Engineering, 8(6):56–62, 2006.10.1109/MCSE.2006.102
Search in Google Scholar Back to article
[7] B. Bergen and F. Hülsemann. Hierarchical hybrid grids: data structures and core algorithms for multigrid. Numer. Linear Algebra Appl., 11:279–291, 2004.10.1002/nla.382
Search in Google Scholar Back to article
[8] M. Blatt, A. Burchardt, A. Dedner, C. Engwer, J. Fahlke, B. Flemisch, C. Gersbacher, C. Grüser, F. Gruber, C. Gräninger, D. Kempf, R. Klöfkorn, T. Malkmus, S. Müthing, M. Nolte, M. Piatkowski, and O. Sander. The distributed and unified numerics environment, version 2.4. Archive of Numerical Software, 4(100):13–29, 2016.
Search in Google Scholar Back to article
[9] M. Bolten, F. Franchetti, P. H. J. Kelly, C. Lengauer, and M. Mohr. Algebraic description and automatic generation of multigrid methods in SPIRAL. Concurrency and Computation: Practice and Experience, 29(17):4105:1–4105:11, 2017. Special Issue on Advanced Stencil-Code Engineering.10.1002/cpe.4105
Search in Google Scholar Back to article
[10] T. Brandvik and G. Pullan. SBLOCK: A framework for efficient stencil-based PDE solvers on multi-core platforms. In 2010 10th IEEE International Conference on Computer and Information Technology, pages 1181–1188, Jun 2010.10.1109/CIT.2010.214
Search in Google Scholar Back to article
[11] M. Christen, O. Schenk, and H. Burkhart. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In 2011 IEEE International Parallel Distributed Processing Symposium, pages 676–687, May 2011.10.1109/IPDPS.2011.70
Search in Google Scholar Back to article
[12] C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, F. Cantonnet, T. El-Ghazawi, A. Mohanti, Y. Yao, and D. Chavarría-Miranda. An evaluation of global address space languages: Co-array Fortran and unified parallel C. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ‘05, pages 36–47, New York, NY, USA, 2005. ACM.10.1145/1065944.1065950
Search in Google Scholar Back to article
[13] Z. DeVito, N. Joubert, F. Palaciosy, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: A domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 1–12. ACM, 2011.10.1145/2063384.2063396
Search in Google Scholar Back to article
[14] H. C. Edwards, C. R. Trott, and D. Sunderland. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Parallel and Distributed Computing, 74(12):3202 – 3216, 2014. Special issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.10.1016/j.jpdc.2014.07.003
Search in Google Scholar Back to article
[15] R. D. Falgout, J. E. Jones, and U. M. Yang. The design and implementation of hypre, a library of parallel high performance preconditioners. In Numerical Solution of Partial Differential Equations on Parallel Computers, pages 267–294, Berlin, Heidelberg, 2006. Springer.10.1007/3-540-31619-1_8
Search in Google Scholar Back to article
[16] M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. SIGPLAN Not., 33(5):212–223, May 1998.10.1145/277652.277725
Search in Google Scholar Back to article
[17] K. Fürlinger, C. Glass, A. Knüpfer, J. Tao, D. Hünich, K. Idrees, M. Maiterth, Y. Mhedheb, and H. Zhou. DASH: Data structures and algorithms with support for hierarchical locality. In Euro-Par 2014 Workshops (Porto, Portugal), pages 542–552, 2014.10.1007/978-3-319-14313-2_46
Search in Google Scholar Back to article
[18] B. Gmeiner, T. Gradl, H. Köstler, and U. Rüde. Highly parallel geometric multigrid algorithm for hierarchical hybrid grids. In K. Binder, G. Münster, and M. Kremer, editors, NIC Symposium 2012, volume 45 of Publication series of the John von Neumann Institute for Computing, pages 323–330, Jülich, Germany, 2012.
Search in Google Scholar Back to article
[19] B. Gmeiner, M. Huber, L. John, U. Rüde, and B. Wohlmuth. A quantitative performance study for Stokes solvers at the extreme scale. J. Comput. Sci., 17(3):509–521, 2016.10.1016/j.jocs.2016.06.006
Search in Google Scholar Back to article
[20] B. Gmeiner, H. Köstler, M. Stürmer, and U. Rüde. Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrency and Computation: Practice and Experience, 26(1):217–240, 2014.10.1002/cpe.2968
Search in Google Scholar Back to article
[21] B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth. Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems. SIAM J. Sci. Comput., 37(2):C143–C168, 2015.10.1137/130941353
Search in Google Scholar Back to article
[22] B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth. Towards textbook efficiency for parallel multigrid. Numer. Math. Theory Methods Appl., 8:2246, 2015.10.4208/nmtma.2015.w10si
Search in Google Scholar Back to article
[23] T. Gysi, T. Grosser, and T. Hoefler. MODESTO: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ‘15, pages 177–186, New York, NY, USA, 2015. ACM.10.1145/2751205.2751223
Search in Google Scholar Back to article
[24] T. Gysi, C. Osuna, O. Fuhrer, M. Bianco, and T. C. Schulthess. STELLA: A domain-specific tool for structured grid methods in weather and climate models. In Proceedings International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pages 41:1–41:12. ACM, Nov 2015.10.1145/2807591.2807627
Search in Google Scholar Back to article
[25] M. Heisig. Petalisp: A common lisp library for data parallel programming. In 11th European Lisp Symposium, page 4, 2018.
Search in Google Scholar Back to article
[26] M. Heisig and H. Köstler. Petalisp: run time code generation for operations on strided arrays. In Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, pages 11–17. ACM, 2018.10.1145/3219753.3219755
Search in Google Scholar Back to article
[27] M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the Trilinos project. ACM Trans. Math. Softw., 31(3):397–423, 2005.10.1145/1089014.1089021
Search in Google Scholar Back to article
[28] L. V. Kale and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. SIGPLAN Notices, 28(10):91–108, Oct 1993.10.1145/167962.165874
Search in Google Scholar Back to article
[29] N. Kohl, D. Thönnes, D. Drzisga, D. Bartuschat, and U. Rüde. The hyteg finite-element software framework for scalable multigrid solvers. International Journal of Parallel, Emergent and Distributed Systems, 0(0):1–20, 2018.
Search in Google Scholar Back to article
[30] H. Köstler, C. Schmitt, S. Kuckuk, F. Hannig, J. Teich, and U. Rüde. A scala prototype to generate multigrid solver implementations for different problems and target multi-core platforms. Int. J. of Computational Science and Engineering, 14(2):150–163, 2017.10.1504/IJCSE.2017.082879
Search in Google Scholar Back to article
[31] H. Köstler, M. Stürmer, and T. Pohl. Performance engineering to achieve real-time high dynamic range imaging. Journal of Real-Time Image Processing, pages 1–13, 2013.10.1007/s11554-012-0312-3
Search in Google Scholar Back to article
[32] S. Kronawitter, S. Kuckuk, H. Köstler, and C. Lengauer. Automatic data layout transformations in the exastencils code generator. Modern Physics Letters A, 28(03):1850009, 2018.10.1142/S0129626418500093
Search in Google Scholar Back to article
[33] S. Kronawitter, S. Kuckuk, H. Köstler, and C. Lengauer. Automatic data layout transformations in the ExaStencils code generator. Parallel Processing Letters, 28(03):1850009, 2018.10.1142/S0129626418500093
Search in Google Scholar Back to article
[34] S. Kronawitter, S. Kuckuk, and C. Lengauer. Redundancy elimination in the ExaStencils code generator. In Algorithms and Architectures for Parallel Processing, pages 159–173, Cham, 2016. Springer International Publishing.10.1007/978-3-319-49956-7_13
Search in Google Scholar Back to article
[35] S. Kuckuk, G. Haase, D. A. Vasco, and H. Köstler. Towards generating efficient flow solvers with the ExaStencils approach. Concurrency and Computation: Practice and Experience, 29(17):4062:1–4062:17, 2017. Special Issue on Advanced Stencil-Code Engineering.10.1002/cpe.4062
Search in Google Scholar Back to article
[36] S. Kuckuk and H. Köstler. Automatic generation of massively parallel codes from ExaSlang. Computation, 4(3):27:1–27:20, 2016. Special Issue on High Performance Computing (HPC) Software Design.10.3390/computation4030027
Search in Google Scholar Back to article
[37] S. Kuckuk and H. Köstler. Whole program generation of massively parallel shallow water equation solvers. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pages 78–87, Sept 2018.10.1109/CLUSTER.2018.00020
Search in Google Scholar Back to article
[38] S. Kuckuk and H. Kstler. Automatic generation of massively parallel codes from exaslang. Computation, 4(3):27:1–27:20, 2016. Special Issue on High Performance Computing (HPC) Software Design.10.3390/computation4030027
Search in Google Scholar Back to article
[39] S. Kuckuk, L. Leitenmaier, C. Schmitt, D. Schönwetter, H. Köstler, and D. Fey. Towards virtual hardware prototyping for generated geometric multigrid solvers. Technical Report CS 2017-01, Technische Fakultät, 2017.
Search in Google Scholar Back to article
[40] C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, et al. Exastencils: Advanced stencil-code engineering. In European Conference on Parallel Processing, pages 553–564. Springer, 2014.10.1007/978-3-319-14313-2_47
Search in Google Scholar Back to article
[41] C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, S. Kuckuk, H. Rittich, and C. Schmitt. ExaStencils: Advanced stencil-code engineering. In L. Lopes et al., editors, Euro-Par 2014: Parallel Processing Workshops, volume 8806 of Lecture Notes in Computer Science (LNCS), pages 553–564. Springer, 2014.10.1007/978-3-319-14313-2_47
Search in Google Scholar Back to article
[42] A. Logg, K.-A. Mardal, and G. N. Wells. Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering (LNCSE). Springer, 2012.10.1007/978-3-642-23099-8
Search in Google Scholar Back to article
[43] N. Maruyama, K. Sato, T. Nomura, and S. Matsuoka. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In SC ‘11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, Nov 2011.10.1145/2063384.2063398
Search in Google Scholar Back to article
[44] G. R. Mudalige, I. Reguly, M. B. Giles, C. Bertolli, and P. H. J. Kelly. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In Proc. Innovative Parallel Computing (InPar), San Jose, California, May 2012. IEEE.10.1109/InPar.2012.6339594
Search in Google Scholar Back to article
[45] G. Ofenbeck, T. Rompf, and M. Püschel. Staging for generic programming in space and time. SIGPLAN Not., 52(12):15–28, Oct 2017.10.1145/3170492.3136060
Search in Google Scholar Back to article
[46] M. Püschel, F. Franchetti, and Y. Voronenko. Spiral, volume 4, pages 1920–1933. Springer, 2011.
Search in Google Scholar Back to article
[47] F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. Mcrae, G.-T. Bercea, G. R. Markall, and P. H. J. Kelly. Firedrake: Automating the finite element method by composing abstractions. ACM Trans. on Mathematical Software (TOMS), 43(3):24:1–24:27, 2016.10.1145/2998441
Search in Google Scholar Back to article
[48] P. Rawat, M. Kong, T. Henretty, J. Holewinski, K. Stock, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. SDSLc: A multi-target domain-specific compiler for stencil computations. In Proc. 5th Int’l Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 6:1–6:10. ACM, Nov 2015.10.1145/2830018.2830025
Search in Google Scholar Back to article
[49] C. Schmitt, S. Kuckuk, F. Hannig, H. Köstler, and J. Teich. Exa-Slang: A domain-specific language for highly scalable multigrid solvers. In Proc. 4th Int’l Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 42–51. IEEE Computer Society, Nov. 2014.10.1109/WOLFHPC.2014.11
Search in Google Scholar Back to article
[50] C. Schmitt, M. Schmid, F. Hannig, J. Teich, S. Kuckuk, and H. Köstler. Generation of multigrid-based numerical solvers for FPGA accelerators. In Proc. 2nd Int’l Workshop on High-Performance Stencil Computations (HiStencils), pages 9–15, Jan. 2015.
Search in Google Scholar Back to article
[51] C. Schmitt, M. Schmid, S. Kuckuk, H. Köstler, J. Teich, and F. Hannig. Reconfigurable hardware generation of multigrid solvers with conjugate gradient coarse-grid solution. Parallel Processing Letters, 28(04):1850016, 2018.10.1142/S0129626418500160
Search in Google Scholar Back to article
[52] J. Schmitt, H. Köstler, J. Eitzinger, and R. Membarth. Unified code generation for the parallel computation of pairwise interactions using partial evaluation. In 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC), pages 17–24. IEEE, 2018.10.1109/ISPDC2018.2018.00012
Search in Google Scholar Back to article
[53] Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The Pochoir stencil compiler. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 117–128. ACM, 2011.10.1145/1989493.1989508
Search in Google Scholar Back to article
[54] U. Trottenberg, C. Oosterlee, and A. Schüller. Multigrid. Academic Press, San Diego, CA, USA, 2001.
Search in Google Scholar Back to article
[55] A. Vogel, S. Reiter, M. Rupp, A. Nägel, and G. Wittum. UG 4: A novel flexible software system for simulating pde based models on high performance computers. Computing and Visualization in Science, 16(4):165–179, 2013.10.1007/s00791-014-0232-9
Search in Google Scholar Back to article
[56] T. Weinzierl. The peano softwareparallel, automaton-based, dynamically adaptive grid traversals. ACM Transactions on Mathematical Software (TOMS), 45(2):14, 2019.10.1145/3319797
Search in Google Scholar Back to article

Code generation approaches for parallel geometric multigrid solvers

References

Paradigm

My account