The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures

Beata Bylina; Jarosław Bylina

doi:10.2478/amcs-2019-0030

.blurhash-client-img { display: none !important; }

The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures

International Journal of Applied Mathematics and Computer Science

Volume 29 (2019): Issue 2 (June 2019)

By: Beata Bylina and Jarosław Bylina

Open Access

|Jul 2019

Abstract

The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-level parallelism. The third implementation, except for BLAS operations, employs the OpenMP task directive with the depend clause. We report the computational performance and the speedup of the parallel tiled WZ factorization algorithm on shared memory multicore architectures for dense square diagonally dominant matrices. Then we compare our parallel implementations with the respective LU factorization from a vendor implemented LAPACK library. We also analyze the numerical accuracy. Two of our implementations can be achieved with near maximal theoretical speedup implied by Amdahl’s law.

References

Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P. and Tomov, S. (2009). Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series180(1): 012037.10.1088/1742-6596/180/1/012037
Search in Google Scholar Back to article
Amdahl, G.M. (1967). Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the Spring Joint Computer Conference, AFIPS’67 (Spring), Atlantic City, NJ, USA, pp. 483–485.10.1145/1465482.1465560
Search in Google Scholar Back to article
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A. and Sorensen, D. (1999). LAPACK Users’ Guide, 3rd Edn., SIAM, Philadelphia, PA.10.1137/1.9780898719604
Search in Google Scholar Back to article
Buttari, A., Langou, J., Kurzak, J. and Dongarra, J. (2009). A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing35(1): 38–53.10.1016/j.parco.2008.10.002
Search in Google Scholar Back to article
Bylina, B. (2018). The block WZ factorization, Journal of Computational and Applied Mathematics331(C): 119–132.10.1016/j.cam.2017.10.004
Search in Google Scholar Back to article
Bylina, B. and Bylina, J. (2007). Incomplete WZ factorization as an alternative method of preconditioning for solving Markov chains, in R. Wyrzykowski et al. (Eds.), PPAM, Lecture Notes in Computer Science, Vol. 4967, Springer, Berlin/Heidelberg, pp. 99–107.10.1007/978-3-540-68111-3_11
Search in Google Scholar Back to article
Bylina, B. and Bylina, J. (2009). Influence of preconditioning and blocking on accuracy in solving Markovian models, International Journal of Applied Mathematics and Computer Science19(2): 207–217, DOI: 10.2478/v10006-009-0017-3.10.2478/v10006-009-0017-3
Open DOI Search in Google Scholar Back to article
Bylina, B. and Bylina, J. (2015). Strategies of parallelizing nested loops on the multicore architectures on the example of the WZ factorization for the dense matrices, in M. Ganzha et al. (Eds.), Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, Annals of Computer Science and Information Systems, Vol. 5, IEEE, Piscataway, NJ, pp. 629–639.10.15439/2015F354
Search in Google Scholar Back to article
Donfack, S., Dongarra, J., Faverge, M., Gates, M., Kurzak, J., Luszczek, P. and Yamazaki, I. (2015). A survey of recent developments in parallel implementations of Gaussian elimination, Concurrency and Computation: Practice and Experience27(5): 1292–1309.10.1002/cpe.3306
Search in Google Scholar Back to article
Dongarra, J., DuCroz, J., Duff, I.S. and Hammarling, S. (1990). A set of level-3 basic linear algebra subprograms, ACM Transactions on Mathematics Software16(1): 1–17.10.1145/77626.79170
Search in Google Scholar Back to article
Dongarra, J.J., Faverge, M., Ltaief, H. and Luszczek, P. (2013). Achieving numerical accuracy and high performance using recursive tile LU factorization, Concurrency and Computation: Practice and Experience26(6): 1408–1431.10.1002/cpe.3110
Search in Google Scholar Back to article
Dumas, J.G., Gautier, T., Pernet, C., Roch, J.L. and Sultan, Z. (2016). Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination, Parallel Computing57: 235–249.10.1016/j.parco.2015.10.003
Search in Google Scholar Back to article
Evans, D. and Hatzopoulos, M. (1979). A parallel linear system solver, International Journal of Computer Mathematics7(3): 227–238.10.1080/00207167908803174
Search in Google Scholar Back to article
Flynn, M.J. (1972). Some computer organizations and their effectiveness, IEEE Transactions on Computers21(9): 948–960.10.1109/TC.1972.5009071
Search in Google Scholar Back to article
García, I., Merelo, J., Bruguera, J. and Zapata, E. (1990). Parallel quadrant interlocking factorization on hypercube computers, Parallel Computing15(1–3): 87–100.10.1016/0167-8191(90)90033-6
Search in Google Scholar Back to article
Gustavson, F.G. (1997). Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development41(6): 737–756.10.1147/rd.416.0737
Search in Google Scholar Back to article
Intel (2019). Math Kernel Library, https://software.intel.com/en-us/mkl.
Search in Google Scholar Back to article
Kurzak, J., Langou, J., Langou, C.D.J., Ltaief, H., Luszczek, P., Yarkhan, A., Haidar, A., Hoffman, J., Agullo, P.D.E., Buttari, A. and Hadri, B. (2010). PLASMA Users’ Guide: Parallel Linear Algebra Software for Multicore Architectures, Version 2.3., http://icl.cs.utk.edu/projectsfiles/plasma/pdf/users_guide.pdf.
Search in Google Scholar Back to article
Marqués, M., Quintana-Ortí, G., Quintana-Ortí, E.S. and van de Geijn, R.A. (2011). Using desktop computers to solve large-scale dense linear algebra problems, The Journal of Supercomputing58(2): 145–150.10.1007/s11227-010-0394-2
Search in Google Scholar Back to article
Rao, S.C.S. (1997). Existence and uniqueness of WZ factorization, Parallel Computing23(8): 1129–1139.10.1016/S0167-8191(97)00042-2
Search in Google Scholar Back to article
Yalamov, P. and Evans, D. (1995). The WZ matrix factorisation method, Parallel Computing21(7): 1111–1120.10.1016/0167-8191(94)00088-R
Search in Google Scholar Back to article
Yarkhan, A., Kurzak, J., Luszczek, P. and Dongarra, J. (2017). Porting the PLASMA numerical library to the OpenMP standard, International Journal of Parallel Programming45(3): 612–633, DOI:10.1007/s10766-016-0441-6.10.1007/s10766-016-0441-6
Open DOI Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/amcs-2019-0030 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X

Journal RSS Feed

Language: English

Page range: 407 - 419

Submitted on: Sep 8, 2018

Accepted on: Mar 2, 2019

Published on: Jul 4, 2019

Published by: University of Zielona Góra

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

tiled algorithm,

WZ factorization,

solution of linear systems,

Amdahl’s law,

high performance computing,

multicore architectures

Related subjects:

Mathematics,

Applied mathematics

© 2019 Beata Bylina, Jarosław Bylina, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 29 (2019): Issue 2 (June 2019)