Tiling arbitrarily nested loops by means of the transitive

Włodzimierz Bielecki; Marek Pałkowski

doi:10.1515/amcs-2016-0065

.blurhash-client-img { display: none !important; }

Tiling arbitrarily nested loops by means of the transitive

International Journal of Applied Mathematics and Computer Science

Volume 26 (2016): Issue 4 (December 2016)

By: Włodzimierz Bielecki and Marek Pałkowski

Open Access

|Dec 2016

Ahmed, N., Mateev, N. and Pingali, K. (2000). Tiling imperfectly-nested loop nests, ACM/IEEE 2000 Conference on Supercomputing, Dallas, TX, USA, Article No. 31.
Search in Google Scholar Back to article
Andonov, R., Balev, S., Rajopadhye, S. and Yanev, N. (2001). Optimal semi-oblique tiling, IEEE Transactions on Parallel and Distributed Systems 14(9): 940-966.10.1109/TPDS.2003.1233716
Search in Google Scholar Back to article
Bastoul, C. (2004). Code generation in the polyhedral model is easier than you think, PACT’13, IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, France, pp. 7-16.
Search in Google Scholar Back to article
Bastoul, C. and Feautrier, P. (2003). Improving data locality by chunking, International Conference on Compiler Construction, Warsaw, Poland, pp. 320-335.
Search in Google Scholar Back to article
Beletska, A., Bielecki, W., Cohen, A., Palkowski, M. and Siedlecki, K. (2011). Coarse-grained loop parallelization: Iteration space slicing vs affine transformations, Parallel Computing 37(8): 479-497.10.1016/j.parco.2010.12.005
Search in Google Scholar Back to article
Bielecki, W., Kraska, K. and Klimek, T. (2014). Using basis dependence distance vectors to calculate the transitive closure of dependence relations by means of the Floyd-Warshall algorithm, Journal of Combinatorial Optimization 30(2): 253-275.10.1007/s10878-014-9740-2
Search in Google Scholar Back to article
Bielecki,W., Klimek, T., Palkowski, M. and Beletska, A. (2010). An iterative algorithm of computing the transitive closure of a union of parameterized affine integer tuple relations, in W. Wu and O. Daescu (Eds.), COCOA 2010: Fourth International Conference on Combinatorial Optimization and Applications, Lecture Notes in Computer Science, Vol. 6508, Springer, Berlin/Heidelberg, pp. 104-113.
Search in Google Scholar Back to article
Bielecki, W. and Palkowski, M. (2015). Perfectly nested loop tiling transformations based on the transitive closure of the program dependence graph, in A. Wilinski et al. (Eds.), Soft Computing in Computer and Information Science, Advances in Intelligent Systems and Computing, Vol. 342, Springer International Publishing, Cham, pp. 309-320.10.1007/978-3-319-15147-2_26
Search in Google Scholar Back to article
Bielecki, W., Palkowski, M. and Klimek, T. (2012). Free scheduling for statement instances of parameterized arbitrarily nested affine loops, Parallel Computing 38(9): 518-532.10.1016/j.parco.2012.06.001
Search in Google Scholar Back to article
Bielecki, W., Palkowski, M. and Klimek, T. (2015). Free scheduling of tiles based on the transitive closure of dependence graphs, in R. Wyrzykowski (Ed.), 11th International Conference on Parallel Processing and Applied Mathematics, Part II, Lecture Notes in Computer Science, Vol. 9574, Springer, Berlin/Heidelberg, pp. 133-142.
Search in Google Scholar Back to article
Błaszczyk, J., Karbowski, A. and Malinowski, K. (2007). Object library of algorithms for dynamic optimization problems: Benchmarking SQP and nonlinear interior point methods, International Journal of Applied Mathematics and Computer Science 17(4): 515-537, DOI: 10.2478/v10006-007-0043-y.10.2478/v10006-007-0043-y
Search in Google Scholar Back to article
Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A. and Sadayappan, P. (2008a). Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model, in L. Hendren (Ed.), Compiler Constructure, Lecture Notes in Computer Science, Vol. 4959, Springer, Berlin/Heidelberg, pp. 132-146.
Search in Google Scholar Back to article
Bondhugula, U., Hartono, A., Ramanujam, J. and Sadayappan, P. (2008b). A practical automatic polyhedral parallelizer and locality optimizer, ACM SIGPLAN Notices 43(6): 101-113.10.1145/1379022.1375595
Search in Google Scholar Back to article
Campbell, S.L. (2001). Numerical analysis and systems theory, International Journal of Applied Mathematics and Computer Science 11(5): 1025-1034.
Search in Google Scholar Back to article
Feautrier, P. (1992a). Some efficient solutions to the affine scheduling problem, I: One-dimensional time, International Journal of Parallel Programming 21(5): 313-348.10.1007/BF01407835
Search in Google Scholar Back to article
Feautrier, P. (1992b). Some efficient solutions to the affine scheduling problem, II: Multidimensional time, International Journal of Parallel Programming 21(6): 389-420.10.1007/BF01379404
Search in Google Scholar Back to article
Gan, G., Wang, X., Manzano, J. and Gao, G.R. (2009). Tile reduction: The first step towards tile aware parallelization in openmp, in M.S. Muller et al. (Eds.), Evolving OpenMP in an Age of Extreme Parallelism, Springer, Berlin/Heidelberg, pp. 140-153.10.1007/978-3-642-02303-3_12
Search in Google Scholar Back to article
Greenbaum, A. and Chartier, T.P. (2012). Numerical Methods: Design, Analysis, and Computer Implementation of Algorithms, Princeton University Press, Princeton, NJ.
Search in Google Scholar Back to article
Griebl, M. (2004). Automatic Parallelization of Loop Programs for Distributed Memory Architectures, D.Sc. thesis, University of Passau, Passau.
Search in Google Scholar Back to article
Griebl, M., Feautrier, P. and Lengauer, C. (2000). Index set splitting, International Journal of Parallel Programming 28(6): 607-631.10.1023/A:1007516818651
Search in Google Scholar Back to article
Grosser, T., Cohen, A., Kelly, P.H., Ramanujam, J., Sadayappan, P. and Verdoolaege, S. (2013). Split tiling for GPUS: Automatic parallelization using trapezoidal tiles, Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, Houston, TX, USA, pp. 24-31.
Search in Google Scholar Back to article
Irigoin, F. and Triolet, R. (1988). Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’88, San Diego, CA, USA, pp. 319-329.
Search in Google Scholar Back to article
Jeffers, J. and Reinders, J. (2015). High Performance Parallelism Pearls, Volume Two: Multicore and Many-Core Programming Approaches, Morgan Kaufmann, Burlington, MA.
Search in Google Scholar Back to article
Kelly, W., Maslov, V., Pugh, W., Rosser, E., Shpeisman, T. and Wonnacott, D. (1995). The omega library interface guide, Technical report, University of Maryland at College Park, MD.
Search in Google Scholar Back to article
Kelly, W., Pugh, W., Rosser, E. and Shpeisman, T. (1996). Transitive closure of infinite graphs and its applications, International Journal of Parallel Programming 24(6): 579-598.10.1007/BF03356760
Search in Google Scholar Back to article
Kim, D. and Rajopadhye, S.V. (2009). Parameterized tiling for imperfectly nested loops, Technical Report CS-09-101, Colorado State University, Fort Collins, CO.
Search in Google Scholar Back to article
Kowarschik, M. and Weiß, C. (2003). An overview of cache optimization techniques and cache-aware numerical algorithms, in U. Meyer et al. (Eds.), Algorithms for Memory Hierarchies, Springer, Berlin/Heidelberg, pp. 213-232.10.1007/3-540-36574-5_10
Search in Google Scholar Back to article
Leader, J.J. (2004). Numerical Analysis and Scientific Computation, Pearson Addison/Wesley Boston, MA.
Search in Google Scholar Back to article
Lim, A., Cheong, G.I. and Lam, M.S. (1999). An affine partitioning algorithm to maximize parallelism and minimize communication, Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, Rhodes, Greece, pp. 228-237.
Search in Google Scholar Back to article
Lim, A.W. and Lam, M.S. (1994). Communication-free parallelization via affine transformations, in K. Pingali et al. (Eds.), 24th ACM Symposium on Principles of Programming Languages, Springer-Verlag, Berlin/Heidelberg, pp. 92-106.
Search in Google Scholar Back to article
Maciążek, M., Grabowski, D. and Pasko,M. (2015). Genetic and combinatorial algorithms for optimal sizing and placement of active power filters, International Journal of Applied Mathematics and Computer Science 25(2): 269-279, DOI: 10.1515/amcs-2015-0021.10.1515/amcs-2015-0021
Search in Google Scholar Back to article
McMahon, F.H. (1986). The Livermore Fortran kernels: A computer test of the numerical performance range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, CA.
Search in Google Scholar Back to article
Mullapudi, R.T. and Bondhugula, U. (2014). Tiling for dynamic scheduling, IMPACT 2014, 14th International Workshop on Polyhedral Compilation Techniques, Vienna, Austria.
Search in Google Scholar Back to article
NAS (2015). NAS benchmarks suite, http://www.nas.nasa.gov.
Search in Google Scholar Back to article
OpenMP Architecture Review Board (2012). OpenMP application program interface version 4.0, http:// www.openmp.org/mp-documents/OpenMP4.0RC1_final.pdf.
Search in Google Scholar Back to article
Palkowski, M., Klimek, T. and Bielecki,W. (2015). TRACO: An automatic loop nest parallelizer for numerical applications, Federated Conference on Computer Science and Information Systems, Łódź, Poland, pp. 681-686
Search in Google Scholar Back to article
Pol (2012). The Polyhedral benchmark suite, http://www.cse.ohio-state.edu/~pouchet/software/polybench/.
Search in Google Scholar Back to article
Pugh, W. and Rosser, E. (1997). Iteration space slicing and its application to communication optimization, International Conference on Supercomputing, Vienna, Austria, pp. 221-228.
Search in Google Scholar Back to article
Pugh, W. and Rosser, E. (1999). Iteration space slicing for locality, in L. Carter and J. Ferrante (Eds.), Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, Vol. 1863, Springer, Berlin/Heidelberg, pp. 164-184.
Search in Google Scholar Back to article
Pugh, W. and Wonnacott, D. (1993). An exact method for analysis of value-based array data dependences, 6th Annual Workshop on Programming Languages and Compilers for Parallel Computing, Portland, OR, USA, pp. 546-566.
Search in Google Scholar Back to article
Pugh, W. and Wonnacott, D. (1994). Static analysis of upper and lower bounds on dependences and parallelism, ACM Transactions on Programming Languages and Systems 16(4): 1248-1278.10.1145/183432.183525
Search in Google Scholar Back to article
Ramanujam, J. and Sadayappan, P. (1992). Tiling multidimensional iteration spaces for multicomputers, Journal of Parallel and Distributed Computing 16(2): 108-120.10.1016/0743-7315(92)90027-K
Search in Google Scholar Back to article
Sass, R. and Mutka, M. (1994). Enabling unimodular transformations, Proceedings of the 1994 ACM/IEEE Conference on Supercomputing, Washington, DC, USA, pp. 753-762.
Search in Google Scholar Back to article
Strout, M.M., Carter, L., Ferrante, J. and Kreaseck, B. (2004). Sparse tiling for stationary iterative methods, International Journal of High Performance Computing Applications 18(1): 2004.10.1177/1094342004041294
Search in Google Scholar Back to article
Tang, P. and Xue, J. (2000). Generating efficient tiled code for distributed memory machines, Parallel Computing 26(11): 1369-1410.10.1016/S0167-8191(00)00040-5
Search in Google Scholar Back to article
Verdoolaege, S. (2011). Integer set library-manual, http:// www.kotnet.org/~skimo//isl/manual.pdf.
Search in Google Scholar Back to article
Verdoolaege, S. (2012). Barvinok: User guide, Barvinok-0.36, www.garage.kotnet.org/~skimo/barvinok/barvinok.pdf.
Search in Google Scholar Back to article
Verdoolaege, S., Cohen, A. and Beletska, A. (2011). Transitive closures of affine integer tuple relations and their overapproximations, in E. Yahav (Ed.), Proceedings of the 18th international Conference on Static analysis, SAS’11, Springer-Verlag, Berlin/Heidelberg, pp. 216-232.10.1007/978-3-642-23702-7_18
Search in Google Scholar Back to article
Wolf, M.E. and Lam, M.S. (1991). A data locality optimizing algorithm, Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Canada, pp. 30-44.
Search in Google Scholar Back to article
Wonnacott, D.G. and Strout, M.M. (2013). On the scalability of loop tiling techniques, Proceedings of the 3rd International Workshop on Polyhedral Compilation Techniques (IMPACT), Berlin, Germany.
Search in Google Scholar Back to article
Wonnacott, D., Jin, T. and Lake, A. (2015). Automatic tiling of mostly-tileable loop nests, IMPACT 2015, 5th International Workshop on Polyhedral Compilation Techniques, Amsterdam, The Netherlands.
Search in Google Scholar Back to article
Xue, J. (1996). Communication-minimal tiling of uniform dependence loops, Languages and Compilers for Parallel Computing, Springer, Berlin/Heidelberg, pp. 330-349.
Search in Google Scholar Back to article
Xue, J. (1997). On tiling as a loop transformation, Parallel Processing Letters 7(4): 409-424.10.1142/S0129626497000401
Search in Google Scholar Back to article
Xue, J. (2012). Loop Tiling for Parallelism, Springer Science & Business Media, Springer-Verlag, New York, NY.
Search in Google Scholar Back to article
Zdunek, R. (2014). Regularized nonnegative matrix factorization: Geometrical interpretation and application to spectral unmixing, International Journal of Applied Mathematics and Computer Science 24(2): 233-247, DOI: 10.2478/amcs-2014-001710.2478/amcs-2014-0017
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.1515/amcs-2016-0065 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X

Journal RSS Feed

Language: English

Page range: 919 - 939

Submitted on: Nov 3, 2015

Accepted on: Aug 9, 2016

Published on: Dec 30, 2016

Published by: University of Zielona Góra

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

tiling,

transitive closure,

source-to-source compiler,

polyhedral model,

iteration space slicing

Related subjects:

Mathematics,

Applied mathematics

© 2016 Włodzimierz Bielecki, Marek Pałkowski, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 26 (2016): Issue 4 (December 2016)