ConcurrencyTopAnnouncementsMerge Sort

Merge Sort

We talked about how to write merge sort last time:

Again we'd like to count the number of comparisons necessary in order to sort an array of n elements. Notice that all the comparisons happen in the merge method. If we are trying to merge two sorted lists, every time we compare two elements from the lists we will put one in its correct position. When we run out of the elements in one of the lists, we put the remaining elements into the last slots of the sorted list. As a result, merging two lists which have a total of n elements requires at most n-1 comparisons.

Suppose we start with a list of n elements. Let T(n) be a function telling us the number of comparisons necessary to mergesort an array with n elements. As we noted above, we break the list in half, mergesort each half, and then merge the two pieces. Thus the total amount of comparisons needed are the number of comparisons to mergesort each half plus the number of comparisons necessary to merge the two halves. By the remarks above, the number of comparisons to do the final merge is no more than n-1. Thus T(n) <= T(n/2) + T(n/2) + n-1. For simplicity we'll replace the n-1 comparisons for the merging by the even larger n in order to make it easier to see how to approximate this result. We have T(n) = 2 ·T(n/2) + n and if we find a function that satisfies that equation, then we have an upper bound on the number of comparisons made during a mergesort.

Looking at our algorithm, no comparisons are necessary when the size of the array is 0 or 1. Thus T(0) = T(1) = 0. Let us see if we can solve this for small values of n. Because we are constantly dividing the number of elements in half it will be most convenient to start with values of n which are a power of two. Here we list a table of values:

nT(n)
1 = 200
2 = 212*T(1)+2 = 2 = 2*1
4 = 222*T(2)+4 = 8 = 4*2
8 = 232*T(4)+8 = 24 = 8*3
16 = 242*T(8)+16 = 64 = 16*4
32 = 252*T(16)+32 = 160 = 32*5
......
n = 2k2*T(n/2)+n = n*k

Notice that if n = 2k then k = log2 n. Thus T(n) = n ·log2 n. In fact this works as an upper bound for the number of comparisons for mergesort even if n is not even. If we graph this we see that it grows much, much slower than the graph for a quadratic (for example, the one corresponding to the number of comparison for selection sort).

This explains why, when we run the algorithms, the time for mergesort is almost insignificant compared to that for selection sort. Below are some results for these two sorting algorithms along with the results for the searching algorithms we looked at earlier this week.

Search/num elts 10 100 1000 1,000,000
Linear search n/2 5 50 500 500,000
Binary search log2 n 4 7 10 20
Selection sort (n2 - n)/(2) 45 4950 499.500 499,999,500,000
Merge sort n ·log2 n 40 700 10,000 20,000,000

Also, notice that while binary search is more efficient than linear search, especially for large arrays, sorting is much more expensive than linear search. Therefore, it only makes sense to sort an array if we will actually be doing a reasonable number of searches.

Demo: Searching and Sorting Demo.


ConcurrencyTopAnnouncementsMerge Sort