Output details
11 - Computer Science and Informatics
University of Bedfordshire
Cache-oblivious matrix algorithms in the age of multicores and many cores
<12> We highlight the issue of upcoming wider single-instruction, multiple-data units as well as steadily increasing core counts on contemporary and future processor architectures. Our matrix multiplication and LU decomposition code TifaMMy has been ported and tuned on four architectures: SGI's UltraViolet distributed shared-memory machine, Intel's Xeon architecture Sandy Bridge, AMD's Bulldozer architecture, and Intel's Xeon Phi architecture. We also comment on the feasibility graphics processing units. Results are discussed and compared with vendors’ architecture-specific and optimised libraries, namely Math Kernel Library (MKL) and AMD Core Math Library (ACML), TifaMMy executes with equally efficient performance on all four architectures underlining its generic and cache-oblivious properties.