You are reading the third segment of our algorithm design patterns series. We have successfully covered the backtracking and divide-and-conquer techniques. In this article we will cover the so-called Greedy Strategy. Greedy algorithms often, but not always, yield the optimal solution in record time since they approach problem solving from a metaheuristic point of view. Thus, it is critical to master greediness.

Greedy Strategy as an Algorithm Technique - Kruskal's Algorithm (Page 3 of 4 )

In this section we are going to present a problem that can be solved with greedy algorithms. It always gives a global optimal solution. First, let’s phrase the problem. We have n cities and we want to design a telephone infrastructure (network). We’ve got a d[] vector that tells us through which pair of cities it is possible to develop a direct telephone line and what it will cost. Basically, this is the candidates set.

The problem involves searching for the best solution that allows us to connect each of the cities (not necessarily via direct phone line) while minimizing the costs. So, in short, if we think of the cities as nodes (vertices) in a weighted tree, the problem can be paraphrased in the following way: find the minimal spanning tree for a weighted graph. The algorithm that solves this was published and proven by Joseph Kruskal (1956).

Kruskal’s algorithm is a typical greedy algorithm. Here's why: narrowed down, it is all about finding a particular subset of the edges that are able to form a tree that contains all of the nodes (vertices), but the total weight of the is minimized. If the graph is not connected, then Kruskal’s algorithm yields a minimum spanning forest.

Let’s explain the theory behind Kruskal’s work. Create a forest where each node per se is an individual tree. Create an S set that contains all of the graph’s edges. Search for the edge that has the minimum weight from S; if it connects two different trees, then add it to the forest and combine those two trees into one; otherwise, discard the aforementioned edge. In the end, the forest will have only one component—the minimum spanning tree of the source graph on which the algorithm was executed.

In short, the greediness of this algorithm is because it always tries the lowest-cost of the remaining edges. However, this leads to a global optimum that can also be mathematically proven. Thus, it has become the typical algorithm for minimum-cost spanning trees, along with others that are well known, such as Prim’s and Borůvka’s Algorithms.

Now that we know what Kruskal’s Algorithm is all about, let's write its code in C/C++. Check out the attached code snippets. We are going to use one function and two procedures. The function is going to be “select_best().” It analyzes the candidates from the candidates set and returns the local best pick.

The other two procedures are going to be used for merging the candidate into the solution and then, of course, we shouldn’t leave out the main Kruskal’s Algorithm.

int select_best (cities *d, int *status, int m)

{

int cost = INT_MAX, candidate;

for (int i = 1; i <= m; i++)

{

if (d[i].c < cost && status[i] == 1)

{

cost = d[i].c;

candidate = i;

}

}

return candidate;

}

The above code snippets go through the d[] set that contains the candidates. We are looking for the locally best pick, which means that the cost must be minimal right at the moment from the city that we are analyzing at a given time. It also checks whether it is still “free to use” that specific candidate (status == 1).

void merge (int *routes, int n, int a, int b)

{

for (int i = 1; i <= n; i++)

if (routes[i] == routes[a])

routes[i] = routes[b];

}

The above block of code is the procedure merge. As soon as find a route that would work all right, meaning that it is feasible, we need to add it to the solution. This requires marking the telephone line in the routes[] vector/array.

Here’s the entire Kruskal’s Algorithm. It is thoroughly commented. Check it out!

void kruskal (cities *d, int *status, int *routes, int m, int n)

{

for (int i = 1; i <= n; i++)

routes[i] = i;

// we monitor the phone connections between the cities

// initializing is required, each city is connected with itself

for (int i = 1; i <= m; i++)

status[i] = 1;

// status of the candidates, it can be either 1 or 2

// if it’s 1 then it’s in the candidates set

// if it’s 2 then it’s already in the solution set

int sum = 0, checked = 0, done = 0;

// sum stands for the cost; checked counts the checked candidates

// done is the count of the telephone routes already built

while (done < n-1 && checked < m)

{ // until all routes are made or no candidates are left

int i = select_best (d, status, m);

checked++;

status[i] = 0;

// here comes the feasible part (one condition)

if (routes[d[i].x] != routes[d[i].y])

{ // if it’s feasible then let’s add into the solution

merge (routes, n, d[i].x, d[i].y);

sum += d[i].c;

done++;

status[i] = 2;

}

}

if (done == n-1)

{ // this is the solution condition, if yes, then we’ve got one

// it can be mathematically proven that n-1 routes are possible

printf("The cost of the minimum spanning tree: %d.n", sum);

for (int i = 1; i <= m; i++)

if (status[i] == 2)

printf ("(%d, %d) ", d[i].x, d[i].y);

printf ("n");

}

else

printf ("No solution.n");

}

Don’t forget that if you are struggling to get the above algorithm running, you can just download the entire source code of this problem by clocking on the button below. I’ve coded it in Visual Studio 2005 but it should work with any C++ compiler. The code is not exactly ANSI C, but it doesn’t rely on classes, and it’s not object-oriented either. It was specifically written to “make sense” and to be straightforward.

The time has come for us to compile and execute this code. Let’s get creative and generate a problem. Suppose we have six cities (n := 6). Consider the following diagram. In the example below, we have eight possible direct line combinations between those six cities. Our target is to find the minimum spanning tree (to have a connection—not necessarily direct—between each city).

Let’s execute our nifty application. It runs blazingly fast and the output is below.

Here is the visual diagram of the solution. Now if you have doubts you can grab a piece of paper and a pen and try out some of the other possible alternative variations -- but let me assure you that there are no better solutions.

Have fun playing around with the code. There are numerous implementations of Kruskal’s Algorithm. Some approach it differently, others the same way. There is no general rule as to how to must be implemented and how the code must be written. As long as it works and it meets your requirements, it does the job; it definitely should suffice.