analytics

Friday, October 16, 2015

Median pivot quicksort

Starting with a list L of N sortable items like so:
[5,3,4,7,2,1,3,4,5,1,3,4]
Below I will refer to the elements as numbers but the same reasoning applies to any type of sortable items...

This algorithm first takes the first two items from that and makes a new list K and sorts them:
[5,3]
sorted:
K=[3,5]

Now we always calculate an M value which is just floor(length(L)/2) in the above it would be 1: Or working with pointers, it might move to the left or the right within the list as per the following...

There are two types of step, one for K an even number of elements and the other for an odd number of elements...
K has an even number of elements (2) so:
We look at the numbers a and b which are L(M-1) and L(M) and insert the next number from list L between them if it's value is between or equal to a and b, or onto the left of K if it is not in between them and less than a, or append to the right of K if it's not between a and b and more than b...

K = [3,4,5]  <- 4 inserted between 3 and 5

Now K has an odd number of elements so:
We note that M = floor(length(L)/2) = 1
We look at numbers a,b,c which are L(M-1), L(M), L(M+!) and insert the next number from L either onto the left of K, or between a and b, or between b and c, or onto the right of K depending on the value...

K = [3,4,5,7] 7 was greater than c so it got appended to the right of K

And then we repeat until we get K as:

[1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 7, 5]

Since there are 12 items M = 6.This list has the property that every number to the left of L(M)  is less than or equal to L(M) and those to the right are greater than or equal L(M)... And L(M) is either the median for odd lengthed lists, or the number to the right of where it could be for even numbers. So then we do this algorithm recursively on the two halves...

**Proof**

This might be close to the proof but I think I can make it more accurate...

The 2 item list K you start with has an even number of elements so there can't be a median number, but with M=1, L(M) and L(M-1) are on both sides of where a median number could be, let's call these "medial" numbers as long as they have the property that every number to the right of L(M-1) is greater than or equal to L(M-1) and every number to the left of L(M) is less than or equal to L(M) . For an odd number of items L(M) should be the median number and of course as the median every number to the left of L(M) is less than or equal to L(M) and every number to the right of L(M) is greater than or equal  to L(M) ...

Now suppose that K has an even number of elements with L(M) and L(M-1) medial numbers .. X and Y are some sets of numbers, or possibly empty sets for the 2 item list, but both the same size..N is the number to the left of L(M) and M is L(M).
X, N,M, Y
And furthermore from the definition of a medial number every element to the left of M is less than or equal to M and every element to the right of N is greater than or equal to N.

There are three places a new number c can be added. If c is not between N and M and less than N than c gets appended to the left of X, and N becomes M2:
c, X, M2, M, Y
M2 has to be the median of all the numbers because X and Y are the same size and c,X and M,Y would be one more element than the size of X and Y each on both sides of M2, and c is less than M2 and M is greater than M2.
Second Case, c is placed between N and M and becomes M2
X, N, M2, M, Y
M2 is the new median of all the numbers because X and Y are the same size so X, N and M, Y is one more element on both sides of M2 than the size of X and Y..
X N, M2, Y, c
 symmetrically to the first case M2 now has an equal amount of elements on the left and right of it.

Now for K an odd number of numbers you have X a set of numbers, Y a set of numbers, both the same size, M the median and A and B the numbers to the left and right of M
X, A, M, B, Y
There are four places we can add c which give:
c, X, M2, M, B, Y with c less than A and  A becoming M2
X, A, M2, M, B, Y with c between A and M and c becoming M2
X, A, M, M2, B, Y with c between M and B and c becoming M2
X, A, M, M2, Y, c, with  c  greater than B with B becoming M2

I think now it should be clear that in all cases M and M2 are medial numbers , for consistency we can rename them so N is the medial number on the left and M is the medial number on the right to get the same case of even lengthed lists we dealt with earlier with new sets to the left and right X2 N M Y2 and X2 and Y2 still have to be the same size, as each is 1 element larger than X and Y were and furthermore, under the renaming, every element to the left of M is less than or equal to M and every element to the right of N is greater than or equal to N ...
So since we can always add an element to an odd lengthed list and get medial numbers and add an element to an even lengthed list and get a median, and we start with the base case of 2 medial numbers, it follows that no matter how many elements we add, we have medial numbers or a median.
So we get two halves after splitting at the median or medial numbers that we recursively deal with separately by the same method until every sublist is 2 or less elements long, and then we concatenate all these sublists to get a totally sorted list!

**run time complexity**


The run time complexity is always N*log(N) because it is like quicksort in that it halves the list every time but it is able to exactly halve the list each time so the worst case complexity is the same as the best case...
 There are N steps to halve all the sub-lists and then there are log(N) times you have to halve the list multiplying to N*log(N)

**Python Source Code**
import math
def mpsort(l):
    m = math.floor(len(l)/2)
    l = mp(l)
    k = []
    k2 = []
    for i in range(0, m):
        k.append(l[i])
    for j in range(m, len(l)):
        k2.append(l[j])
    if len(k) > 0 and len(k2) > 0:
        k = mpsort(k)
        k2 = mpsort(k2)
        l = k
        for e in k2:
            l.append(e)
    return l
def mp(l):
    k = []
    if len(l) == 1:
        return l
    if l[0] > l[1]:
        temp = l[0]
        l[0] = l[1]
        l[1] = temp
    if len(l) == 2:
        return l
    k = [l.pop(0), l.pop(0)]    
    while len(l) > 0:
        m = math.floor(len(k)/2)
        e = l.pop(0)
        if len(k) % 2 == 0:
            if e >= k[m-1] and e <= k[m]:
                k.insert(m, e)
            elif e < k[m]:
                k.insert(0, e)
            elif e > k[m-1]:
                k.append(e)
        else:
            if e >= k[m] and e <= k[m+1]:
                k.insert(m+1, e)
            elif e >= k[m-1] and e <= k[m]:
                k.insert(m, e)
            elif e < k[m]:
                k.insert(0, e)
            elif e > k[m]:
                k.append(e)
    return k
def main():
    l = [5,3,4,7,2,1,3,4,5,1,3,4]
    l = mpsort(l)
    print(l)
main()

No comments:

Post a Comment