I was trying to think of a way to split a sorting task over many processors like for instance on a gpu, my rough idea is this...
First the many items in unsorted order are given out to the many processors an equal number to each processor... Each processor ...p,q,r... finds the lowest and highest item of those given to it L(x) and H(x) and once they've all completed that. every processor exchanges it's lowest item with the processor to the left if it is smaller than that processors largest item, and it's highest item with the processor to the right if it is greater than that processors smallest item (or keeps it if it's the first or last processor)... This process is repeated until no more items can be exchanged. By then every item in a processors list of items should be greater than every item in the processor to it's left and less than every item in the processor to it's rights list of items, then this can either be recursed upon or each processor can sort it's items and append them all to one master list...
A lot of whether it's feasible depends on architectural questions that I'll have to learn about as I learn how to program a gpu but I'm just making a note of this idea...