Open Access Open Access  Restricted Access Subscription or Fee Access

Dynamic Data Flow Based Spatial Sorting Method for GPUs: Software Based Autonomous Parallelization

Ankush Rai


The direct communication between distant parts of a computer system for a mesh or grid based scalable parallel system isn't cost effective. Thus, we have presented and evaluated the performances of spatial sorting method for multi-core GPUs (Graphic Processing Unit) computing. It is a parallel sorting algorithm for autonomous threading of workload which is considerably faster than Thrust Merge sorting, which is known to be the best comparison based sorting algorithm for GPUs. The study shows that the proposed algorithm is more efficient than randomized sample sort, when the memory operations for parallelization of stack units are taken into consideration. Additionally, the proposed algorithm provides an advantage over traditional sorting methods in way that it facilitate dynamic data flow and therefore, its operating time is devoid of the input data dependent fluctuations. Our technique proves its potential extensions of adaptability to more complex parallel algorithms.

Keywords: Autonomous parallelization, threading, sorting, GPUs, parallel algorithm, dynamic data flow

Full Text:



Bilardi G., Nicolau A. Adaptive Bitonic Sorting. An Optimal Parallel Algorithm for Shared-memory Machines. SIAM J Comput. 1989; 18(2): 216–228p.

Cederman D., Tsigas P. A Practical Quick Sort Algorithm for Graphics Processors. In Proc. European Symposium on Algorithms (ESA) 2008, 5193 of LNCS, 246–258p.

Garland M. Private Communication. nVIDIA Corporation, 2009.

Govindaraju N., Gray J., Kumar R., et al. GPU Tera Sort: High Performance Graphics Co-processor Sorting for Large Database Management. In Proc. International Conference on Management of Data (SIGMOD), Chicago, Illinois, USA, 2006, 325–336p.

GPGPU.ORG. General-purpose Computation on Graphics Hardware. Available at:

Greb A., Zachmann G. GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures. In Proc. Int’l Parallel and Distributed Processing Symposium (IPDPS), 2006.

Leischner N., Osipov V., Sanders P. GPU Sample Sort. In Proc. Int’l Parallel and Distributed Processing Symposium (IPDPS), to appear, 2010. (Currently Available at:

Satish N., Harris M., Garland M. Designing Efficient Sorting Algorithms for Many Core GPUs. In Proc. Int’l Parallel and Distributed Processing Symposium (IPDPS), 2009.

Shi H., Schaeffer J. Parallel Sorting by Regular Sampling. J. Par. Dist. Comp. 1992; 14: 362–372p.

Sintorn E., Assarsson U. Fast Parallel GPU-sorting using a Hybrid Algorithm. J. Parallel Distri. Comp. 2008; 68(10): 1381–1388p.

Lindholm E., Nickolls J., Oberman S., et al. Nvidia Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 2008; 28(2): 39–55p.

nVIDIA Corporation. Gtx 260 GPU Specification. Available at: geforce gtx 260 us.html.

nVIDIA Corporation. Gtx 285 GPU Specification. Available at: geforce gtx 285m us.html.

nVIDIA Corporation. Tesla c1060 GPU Specification. Available at: tesla c1060 us.html.


  • There are currently no refbacks.

This site has been shifted to