Search Results

There are 10986 results for: content related to: Programming many-core architectures - a case study: dense matrix computations on the Intel single-chip cloud computer processor

  1. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting

    Concurrency and Computation: Practice and Experience

    Volume 26, Issue 7, May 2014, Pages: 1408–1431, Jack Dongarra, Mathieu Faverge, Hatem Ltaief and Piotr Luszczek

    Article first published online : 18 SEP 2013, DOI: 10.1002/cpe.3110

  2. Scheduling algorithms-by-blocks on small clusters

    Concurrency and Computation: Practice and Experience

    Volume 25, Issue 3, 10 March 2013, Pages: 367–384, Francisco D. Igual, Gregorio Quintana-Ortí and Robert van de Geijn

    Article first published online : 28 MAR 2012, DOI: 10.1002/cpe.2842

  3. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures

    Concurrency and Computation: Practice and Experience

    Volume 24, Issue 3, 10 March 2012, Pages: 305–321, Azzam Haidar, Hatem Ltaief, Asim YarKhan and Jack Dongarra

    Article first published online : 22 AUG 2011, DOI: 10.1002/cpe.1829

  4. A survey of recent developments in parallel implementations of Gaussian elimination

    Concurrency and Computation: Practice and Experience

    Volume 27, Issue 5, 10 April 2015, Pages: 1292–1309, Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek and Ichitaro Yamazaki

    Article first published online : 2 JUN 2014, DOI: 10.1002/cpe.3306

  5. Enhancing performance and energy consumption of runtime schedulers for dense linear algebra

    Concurrency and Computation: Practice and Experience

    Volume 26, Issue 15, October 2014, Pages: 2591–2611, Pedro Alonso, Manuel F. Dolz, Francisco D. Igual, Rafael Mayo and Enrique S. Quintana-Ortí

    Article first published online : 25 JUN 2014, DOI: 10.1002/cpe.3317

  6. Matrix inversion on CPU–GPU platforms with applications in control theory

    Concurrency and Computation: Practice and Experience

    Volume 25, Issue 8, 10 June 2013, Pages: 1170–1182, Peter Benner, Pablo Ezzatti, Enrique S. Quintana-Ortí and Alfredo Remón

    Article first published online : 10 OCT 2012, DOI: 10.1002/cpe.2933

  7. Modeling power and energy consumption of dense matrix factorizations on multicore processors

    Concurrency and Computation: Practice and Experience

    Volume 26, Issue 17, 10 December 2014, Pages: 2743–2757, Pedro Alonso, Manuel F. Dolz, Rafael Mayo and Enrique S. Quintana-Ortí

    Article first published online : 11 OCT 2013, DOI: 10.1002/cpe.3162

  8. Achieving energy efficiency during collective communications

    Concurrency and Computation: Practice and Experience

    Volume 25, Issue 15, October 2013, Pages: 2140–2156, Vaibhav Sundriyal, Masha Sosonkina and Zhao Zhang

    Article first published online : 13 SEP 2012, DOI: 10.1002/cpe.2911

  9. The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines

    Concurrency: Practice and Experience

    Volume 12, Issue 15, 25 December 2000, Pages: 1481–1493, Eduardo D'Azevedo and Jack Dongarra

    Article first published online : 2 JAN 2001, DOI: 10.1002/1096-9128(20001225)12:15<1481::AID-CPE540>3.0.CO;2-V

  10. Design and performance characterization of electronic structure calculations on massively parallel supercomputers: a case study of GPAW on the Blue Gene/P architecture

    Concurrency and Computation: Practice and Experience

    Volume 27, Issue 1, January 2015, Pages: 69–93, N.A. Romero, C. Glinsvad, A.H. Larsen, J. Enkovaara, S. Shende, V.A. Morozov and J.J. Mortensen

    Article first published online : 27 DEC 2013, DOI: 10.1002/cpe.3199

  11. The LINPACK Benchmark: past, present and future

    Concurrency and Computation: Practice and Experience

    Volume 15, Issue 9, 10 August 2003, Pages: 803–820, Jack J. Dongarra, Piotr Luszczek and Antoine Petitet

    Article first published online : 14 JUL 2003, DOI: 10.1002/cpe.728

  12. Low-level PGAS computing on many-core processors with TSHMEM

    Concurrency and Computation: Practice and Experience

    Volume 27, Issue 17, 10 December 2015, Pages: 5288–5310, Bryant C. Lam, Alan D. George, Herman Lam and Vikas Aggarwal

    Article first published online : 25 JUN 2015, DOI: 10.1002/cpe.3569

  13. Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

    Concurrency and Computation: Practice and Experience

    Volume 27, Issue 17, 10 December 2015, Pages: 5019–5036, Dossay Oryspayev, Hasan Metin Aktulga, Masha Sosonkina, Pieter Maris and James P. Vary

    Article first published online : 14 JUL 2015, DOI: 10.1002/cpe.3499

  14. Parallelizing dense and banded linear algebra libraries using SMPSs

    Concurrency and Computation: Practice and Experience

    Volume 21, Issue 18, 25 December 2009, Pages: 2438–2456, Rosa M. Badia, José R. Herrero, Jesús Labarta, Josep M. Pérez, Enrique S. Quintana-Ortí and Gregorio Quintana-Ortí

    Article first published online : 22 JUL 2009, DOI: 10.1002/cpe.1463

  15. Scheduling dense linear algebra operations on multicore processors

    Concurrency and Computation: Practice and Experience

    Volume 22, Issue 1, January 2010, Pages: 15–44, Jakub Kurzak, Hatem Ltaief, Jack Dongarra and Rosa M. Badia

    Article first published online : 11 AUG 2009, DOI: 10.1002/cpe.1467

  16. A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

    Concurrency and Computation: Practice and Experience

    Volume 27, Issue 14, 25 September 2015, Pages: 3702–3723, Fengguang Song and Jack Dongarra

    Article first published online : 1 OCT 2014, DOI: 10.1002/cpe.3403

  17. UPCBLAS: a library for parallel matrix computations in Unified Parallel C

    Concurrency and Computation: Practice and Experience

    Volume 24, Issue 14, 25 September 2012, Pages: 1645–1667, Jorge González-Domínguez, María J. Martín, Guillermo L. Taboada, Juan Touriño, Ramón Doallo, Damián A. Mallón and Brian Wibecan

    Article first published online : 17 JAN 2012, DOI: 10.1002/cpe.1914

  18. A distributed packed storage for large dense parallel in-core calculations

    Concurrency and Computation: Practice and Experience

    Volume 19, Issue 4, 25 March 2007, Pages: 483–502, Marc Baboulin, Luc Giraud, Serge Gratton and Julien Langou

    Article first published online : 28 SEP 2006, DOI: 10.1002/cpe.1119

  19. Parallel implementation of BLAS: general techniques for Level 3 BLAS

    Concurrency: Practice and Experience

    Volume 9, Issue 9, September 1997, Pages: 837–857, Almadena Chtchelkanova, John Gunnels, Greg Morrow, James Overfelt and Robert A. van de Geijn

    Article first published online : 4 DEC 1998, DOI: 10.1002/(SICI)1096-9128(199709)9:9<837::AID-CPE267>3.0.CO;2-2

  20. MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi

    Concurrency and Computation: Practice and Experience

    Damián A. Mallón, Guillermo L. Taboada and Lars Koesterke

    Article first published online : 28 MAY 2015, DOI: 10.1002/cpe.3552