Open Access Open Access  Restricted Access Subscription or Fee Access

Empirical Analysis of Cache-Efficient In-place Matrix Transposition on Multicore Processors

Riaz Ahmed, Lalitsen Sharma

Abstract


Performance and scalability of cache-efficient in-place matrix transposition algorithms has been experimentally evaluated in this paper. Cache-efficient in-place matrix transposition algorithms namely cache-aware and cache-oblivious are analyzed on multicore machines for performance and scalability. We propose low-level optimization for cache-efficient in-place matrix transposition and find the best configuration for them. The optimization of algorithms has been made by using efficient iterative kernel. The best tile size for cache-aware and best recursive threshold for cache-oblivious algorithms has been experimentally identified. The performance in terms of transposition rate has been evaluated on quad-core and dual-core machines. The performance bottlenecks are identified by evaluating the cache misses of algorithms. The results are also compared with the naive method of matrix transposition. Experimental results show that performance of cache-efficient matrix transposition algorithms remained better than naive algorithm in all cases of matrix sizes and different number of threads. It is also observed that mean performance of cache-oblivious algorithm remained best among all implementations. The overall finding is that cache-aware and cache-oblivious algorithms are efficient on multicore processors as their performance and scalability remained best in our experiments..

Keywords: Cache-aware, cache-oblivious, matrix transposition, multicore, scalability

Cite this Article

Riaz Ahmed, Lalitsen Sharma. Empirical Analysis of Cache-Efficient In-place Matrix Transposition on Multicore Processors. Recent Trends in Parallel Computing. 2019; 6(2): 1–15p.



Full Text:

PDF

Refbacks

  • »
  • »


This site has been shifted to https://stmcomputers.stmjournals.com/