Cheraq.com

I innovate, then I am!

Divide & Conquer - Image Credit: iStockPhoto

Divide and Conquer is the new black!

| 0 comments

Definition

Divide & Conquer (D&C) is an important algorithm design paradigm based on multi-branched recursion [Wikipedia]. In simple words, D&C solves the problem by dividing it into smaller sub-problems. Usually this sub-problems are much more smaller than the original problems (like dividing it into half). If the sub-problem still is complicated, D&C divides that sub-problem into smaller sub-problems and continues this until the sub-problem is simple enough to be solved directly. At this stage, you can solve problem easily and usually in constant time (i.e. O(c)). Then the result of this sub-problem will be sent upwards and merged with other sub-problems result and forms the final result.

Below image shows how a D&C based algorithm works. At this example we want to sort a list of numbers. What we will do is, start to split the list into smaller sub-lists  until we have only one item in the list. Well a list with one item in it is already sorted!! So there is nothing special till here, but the key is how you will merge the result. That’s why this algorithm called Merge Sort. Check it out below. We will discuss it later.

Merge Sort - A D&C based algorithm sample

Figure 1: Merge Sort - A D&C based algorithm example

Key Steps

When you want to solve a problem using D&C, the first step is, finding a good diving procedure. You have to decide how you are going to divide your problem into sub-problems. Diving should be done in a way that it will simplify the problem and reduce the problem size to half or close to half.

Also you should decide when you will stop division and start to solve, which is desired problem size. Choosing a good stop point is crucial, because D&C usually leads to recursive procedures, and so many recursive calls can increase time/space complexity exponentially. It is possible to remove the recursion by using stack, but then again you will face a new problem which is determining proper stack size.

Another option you can consider when you are solving problems is: Parallel Programming. Since D&C divides problem into distinct sub-problems, those problems could be easily solved in parallel, either using multi-processor machines or cloud computing. For example, if you want to search for specific term in a database with size of Google database, it might take ages to find items and sort them, but using cloud (like Google App Engine environment), you can get result in less than one second. Next you will see some powerful algorithms based on D&C.

Binary Search

Binary Search is a fast searching algorithm that can find index of a value in a sorted array. For simplicity, lets say you have a sorted array that contains values in range [0..10] (see Figure 2). Now lets say we are searching for a number, (e.g. 2) in this list. What we can do is we can start from first number and check each and every number until we either find 2 or reach to the end of the list.  But like this if we want to search in a big list, it will take too much time. What we can do is, we can check the middle item (in this example, it is 5) with the value we are looking for (which is 2). if the value is smaller than middle value (2<5), then we can consider the lower sub-array to middle item for our next step of the search and we can ignore the other part. As you can see, because our array is sorted, we can do this kind of prediction. At this step we have reduced our problem size to half and now we need to search to half of the array for the desired number. If we continue this process, at last step we will have this two conditions:

  • Array size reduced to 1, and we can solve problem directly by checking the remaining number with sought value
  • Middle item is equals with number we were looking for (which is the answer)
Figure 2 shows some examples of binary search in a sorted array of integer numbers.

Figure 2: Binary Search in a sorted array of numbers

Below you can find python code for binary search. Most of the programming languages have built-in implementation for binary search and you usually don’t need to implement it again.

def binary_search(seq, t):
    min = 0; max = len(seq) - 1
    while 1:
        if max &lt; min:
            return -1
        m = (min + max) / 2
        if seq[m] &lt; t:
            min = m + 1
        elif seq[m] &gt; t:
            max = m - 1
        else:
            return m

As you can see in the code above, at each step, the list will be divided to half and then one half will be used for the next step, so your algorithm at worse case will be O(log2 n) or for simplicity O(log n). This means if you have a list which contains 1 billion items (1,000,000,000), using binary search at worse case you need to examine 30 items to find the result, but if you do normal search, you might need 1 billion examinations and at end you will find out it is not in the list.

There are some argue about Binary Search and some people are saying Binary Search is not a complete D&C based algorithm because it is dividing the problem into two sub-problems, but it is only considering one of them for next step; and based on this, they are offering different name for it such as decrease and conquer, etc. No matter what we call it, Binary Search is a ROCK STAR.  If you check Wikipedia for binary search, you will find out that implementing Binary Search seems easy initially, but is tricky. I hope the code I’ve published above is correct ;)

Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky… — Professor Donald Knuth

Merge Sort

Merge sort is another D&C based algorithm that we will have a look at it in this post. We have already discussed about this algorithm at the beginning of this post (see Figure 1), but below you can find the general idea behind it.

Let’s say we have a list (L) with size n and we want to sort it. Like any other D&C algorithm, first step is to divide the problem into smaller sub-problems. In Merge Sort, at each step, we should divide the list into to sub-lists with size almost equal to n/2. We will do this until we reach to a list with size equal to 1. At this point, we will stop dividing and start solving the problem. Before we start next step, let’s see what we have right now, we have n lists with size 1. Now we start to conquer the sub-problems and form final result. At each step we will merge two sorted lists, this can be done in O(n). At the worst case, the time complexity of merge sort will be O(n log2 n).

Another issue that should be considered about Merge Sort is it’s space complexity. An iterative but non-recursive version of merge sort requires 2n space, which means if you have a list with size 1GB, then you need 2GB space in order to perform merge sort. This is because of the fact that merging two lists with size n/2 requires n/2 + n/2 + n = 2n space (two lists with size n/2 and one list with size n to save merged result). There are also some implementations which requires only n/2 extra space to perform the search, but for huge arrays where space is a key factor, other algorithms such as heap sort might be better option.

Conclusion

D&C can be used to solve complicated problems in acceptable time/space complexity. This technique is the basis of efficient algorithms for all kinds of problems, such as searching, sorting,  multiplying large number, syntactic analysis and so on. Main idea behind D&C is to reduce the problem into smaller sub-problems that could be solved independently and their result could form the main problem’s result. Since D&C creates independent sub-problems, it can benefit from Parallel Programming and Cloud Computing more than other methodologies and that’s why D&C is the new Black!

Leave a Reply

Required fields are marked *.

*