Back to Basics: Sparse Coding?

ameya005:

A good introduction to Sparse Coding. Hope to do some stuff regarding this in the future.

Originally posted on the Serious Computer Vision Blog:

by Gooly (Li Yang Ku)

Gabor like filters

It’s always good to go back to the reason that lured you into computer vision once in a while. Mine was to understand the brain after I astonishingly realized that computers have no intelligence while I was studying EE in undergrad. In fact if they use the translation “computer” instead of  “electrical brain” in my mother language, I would probably be better off.

Anyway, I am currently revisiting some of the first few computer vision papers I read, and to tell the truth I still learn a lot from reading stuffs I read several times before, which you can also interpret it as I never actually understood a paper.

So back to the papers,

Simoncelli, Eero P., and Bruno A. Olshausen. “Natural image statistics and neural representation.” Annual review of neuroscience 24.1 (2001): 1193-1216.

Olshausen, Bruno A., and David J. Field. “Sparse coding with an…

View original 237 more words

VLAD- An extension of Bag of Words

Recently, I was a participant at TagMe- an image categorization competition conducted by Microsoft and Indian Institute of Science, Bangalore. The problem statement was to classify a set of given images into five classes: faces, shoes, flowers, buildings and vehicles. As it goes, it is not a trivial problem to solve. So, I decided to attempt my existing bag-of-words algorithm on that. It worked to an extent, I got an accuracy of 86% approximately with SIFT features and an RBF SVM for classification. In order to improve my score though, I decided to look at better methods of feature quantization. I had been looking at VLAD (Vector of Locally Aggregated Descriptors): A first order extension to BoW for my Leaf Recognition project.

So, I decided to attempt to use VLAD using OpenCV and implemented a small function based on the BoW API currently in OpenCV for VLAD. The results showed remarkable improvement with an accuracy of 96.5 % using SURF descriptors on teh validation dataset provided by the organizers.

What is VLAD

Recalling BoW, it involved simply counting the no. of descriptors associated with each cluster in a codebook(vocabulary) and creating a histogram for each set of descriptors from an image, thus representing the information in a an image in a compact vector. VLAD is an extension of this concept. We accumulate the residual of each descriptor with respect to its assigned cluster. In simpler terms, we match a descriptor to its closest cluster, then for each cluster, we store the sum of the differences of the descriptors assigned to the cluster and the centroid of the cluster. Let us have a look at the math behind VLAD..

Mathematical Formulation

As with bag of words, we first train a codebook from the descriptors from our training dataset, as C=\{c_1,c_2,...c_k\} where k is the no. of clusters in K-means. We then associate each d-dimensional local descriptor, x from an image with its nearest neighbour in the codebook.

The idea behind VLAD feature quantization is that, for each cluster centroid, c_i, we accumulate the difference x-c_i where for each x, c_i = NN(x)

Representing the VLAD vector for each image by v, we have,

v_{ij} =\sum_{x|x=NN(c_i)} {(x_j - c_{ij})}

where i=1,2,3...k and j=1,2,3..d

The vector v is subsequently normalized with its L_2 norm as v=\frac{v}{\|v\|_2}

Comparison with BoW

The primary advantage of VLAD over BoW is that we add more discriminative property in our feature vector by adding the difference of each descriptor from the mean in its voronoi cell. This first order statistic adds more information in our feature vector and hence gives us better discrimination for our classifier. This also points us to other improvements we can   adding higher order statistics to our feature encoding as well as looking at soft assignment,i,e. assigning each descriptor multiple centroids weighed by their distance from the descriptor.

Experiments

Here are a few of my results on the TagMe dataset.

results

Improvements to VLAD:

There are several extension possible for VLAD, primarily various normalization options. Arandjelov and Zissermann in their paper, All about VLAD, propose several normalization techniques, including intra normalization and power normalization alonging with a spatial extension – MultiVLAD. Delhumeau et al, propose several different normalization techniques as well as a modification to the VLAD pipeline to show improvements to almost state of the art.

Other references also stress on spatial pooling i.e. dividing your image into regions to get multiple VLAD vectors for each tile to better represent local features and spatial structure. A few also advise soft assignment, which refers to assignment of descriptors to multiple clusters, weighed by their distance from the cluster.

Code:

Here is a link to my code for TagMe. It was a quick has job for testing so it is not very clean though I am going to clean it up soon.

https://github.com/ameya005/VLAD-Implementation

also, a few references for those who want to read the papers I referred:

1.Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(9), 1704-1716.

2. Delhumeau, J., Gosselin, P. H., Jégou, H., & Pérez, P. (2013, October). Revisiting the VLAD image representation. In Proceedings of the 21st ACM international conference on Multimedia (pp. 653-656). ACM.

3. Arandjelovic, R., & Zisserman, A. (2013, June). All about VLAD. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on (pp. 1578-1585). IEEE.

Well, that concludes this post. Be on the lookout for more on image retrieval – improvements to VLAD and Fisher Vectors.

 

SuperPixels: SLIC and more

Hey guys,

I recently came across a very cool computer vision concept – SuperPixels. Superpixels are what you would call aggregations of pixels with common features – for example, color, illumination, absorbance, spatial location etc. It is perhaps the precursor to full object segmentation. Here, I am going to discuss a specific kind of superpixels called SLIC.

SLIC stands for Simple Linear Iterative Clustering. The name defines it all as it is just a simple representation of clustering using k-means. This was first proposed in the following paper:

Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aure-lien Lucchi, Pascal Fua, and Sabine Susstrunk, SLIC Superpixels, EPFL Technical Report 149300, June 2010.

The basic idea of superpixels is that they contain a lot more high level information as compared to just pixels. Pixels give extremely low level and local information which many-a-times cannot be used without heavy duty processing. Instead, superpixels provide less localized information which makes more sense to have considering problems like segmentation of objects. A simple analogy would be to step back a bit and look at a slightly bigger picture.

To start of with, lets look at a few example images given in the paper:

slic1

slic2

As you can see, pixels are clustered according to their location and color. The different slices represent different sizes of superpixels. One of the major advantages of using SLIC  superpixels is that they show low undersegmentation and high boundary recall. It is one of the most efficient methods in the methods for getting superpixels.

The algorithm is very intuitive. We start off by converting the image to L*a*b* color-space. We then create feature points from each pixel consisting of the L*,a*,b* values and the x and y co-ordinates. These features are then clustered using k-means clustering

Now, one of the brilliant things of SLIC is that it takes the x-y co-ordinates into the picture while clustering. So, you get uniform and well defines superpixels. This can then be further used for segmentation. The distance function used for clustering is given by

So, we start off with K equally spaced clusters all across the image. These clusters are centred around the lowest gradient position in a 3×3 neighbourhood around the equally spaced points. We do this to avoid putting them at edges and to reduce the probability of choosing a noisy pixel. We calculate the image gradients as follows:

G(x,y) =  \| I(x+1,y) - I(x-1,y) \|^2 + \|I(x,y+1) - I(x-1,y)\|^2

where I (x,y) is the lab vector corresponding to the pixel and \|.\| is the L2 norm.
Each pixel in the image is then associated with the nearest cluster centre in the labxy feature space. After all pixels are associated with one or the other cluster, we again calculate a new cluster centre as the average value of the labxy features in each cluster. We then repeat the process of clustering and recalculating until we achieve convergence.

As a result, we will have nicely clustered groups of pixels, both in color as well as spatial domain. Now, there might be a few stray pixels that may have got clustered in the wrong clusters. Though this is pretty rare according to the authors of the paper and my experiments, we enforce a connectivity constraint on them in order to remove these outliers. The paper doesnot mention this in any great detail, but connectivity is an important condition to impose, especially if we are going to use SLIC superpixels for segmentation.

Thats it for the explanation of an amazing technique, folks! I will post my OpenCV code link in a few days as well as will discuss a few issues with the code and segmentation techniques using these in the next few posts.

P.S.:By the way, we will also be looking at Bag of Words in a little more detail in a few more days. Be on the look out for a barrage of posts.

Bag of Words – Object Recognition

Hey guys,

Its been a really long time since my last post. But this series of posts is going to be a really cool I hope.

Today we are going to discuss one of the most important problems in Computer Vision- Object Recognition. We humans tend to trivially recognize objects without consciously paying attention to the fact or even wondering how exactly do we achieve this. You look at a baseball flying towards your face, you recognize it as a baseball about to break your nose, and you duck! All in a matter of a few microseconds.

But the process that your brain undertakes in those few microseconds has eluded perfect implementation in computation for several years now. Object recognition is perhaps, rightly considered the primary problem in computer vision. But recent research advances have made strides in this matter.

I recently undertook a project in which I had to classify leaves into species they come from. And as it sounds, it’s not really a trivial problem. It took me a few days to figure out the first steps to such a process. And to start of with I decided to use the Bag-of Words model, a highly cited method for scene and object classification for the above problem.

To begin with, I found a really nice dataset to work with here: http://flavia.sourceforge.net/ . The dataset contains images for 32 species of leaves on plain white backgrounds which simplified my experiment. I am really grateful to them for providing such a comprehensive dataset for free on the web. (Kinda all for Open Access now.).

Bag of words is a basically a simplified representation of an image. Its actually a concept taken form Natural Language Processing where you represent documents as an unordered collection of words disregarding grammar. Translating this into CV jargon, it means that we simplify images by picking out features from an image and representing it as a collection of features. A good explanation of what features are can be found at my friend, Siddharth’s blog here.

To get more technical about the BoW- we construct a vocabulary of features. We then use this vocabulary to create histograms from features for each image and then use a simple machine learning algorithm like SVM or Naive Bayes for classification.

This is the algorithm I followed for BoW. I got a lot of help from Roy’s blog here.

1. We pick out features from all the images in our training dataset. I used SIFT (Scale Invariant Feature Transform).

2. We cluster these features using any clustering algorithm. I used K-Means. (Pretty fast in OpenCV)

3. We use the cluster as a vocabulary to construct histograms. We simply count the no. of features from each image belonging to each cluster. Then we normalize the histograms by dividing it with the no. of features. Therefore, each image in the dataset is represented by one histogram.

4. Then these histograms are passed to an SVM for training. I currently use a Radial Basis function multi-class SVM in OpenCV. Using OpenCV’s CvSVM::train_auto() function, we get parameters for the SVM using cross validation.

Now why does Bag of Words work? Why use it rather than simple feature matching? The answer to that question is simple: features provide just local information. Using the bag-of-words model we create a global representation of an object. Thus, we take a group of features, create a representation of the image in a simpler form and classify it.

That was for the pros of the algorithm. But there are a few cons associated with this model.

1. As evident, we cannot localize an object in an image using this model. That is to say, the problem of finding where the object of interest lies is still open and needs other methods.

2. We neglect grammar. In CV terms, it means we neglect the position of features relative to each other. Thus the concept of a global shape maybe lost.

As for our Leaf Recognizer, we are still working on improving the accuracy. We are almost at our goal! The following are some of the images we got as a result of the above algorithm:

Test_image_1236test_image_1242

Test_image_3176Test_image_3177test_image_1373test_image_1378

The Traveling Salesman Problem – Using Ant Algorithms

Hey guys,

As I had promised, this post will be about using the Ant algorithms I had discussed in the previous post to solve a complex computational problem. But, before we go on, let us have a look again at Ant Colony optimization.

Ant Colony Optimization is basically a group of algorithms used to find optimum paths in a graph. It does so by mimicking the foraging behavior of an ant colony.  As I had said before, ants use pheromones to communicate and thus the amount of pheromone on each path decides the optimality of the path. Thus, an ant is more likely to take a path with more pheromones on it. We use this concept and basically create ‘Agents’ that we call ‘Ants’. Each Ant is placed at a node and is the allowed to traverse the graph. At the end of it’s tour, we calculate the amount of pheromone on each edge of the path and then assign a probability value to it based on the amount of pheromone and the ‘visibility’ of the next node. Thus, by using a no. of ants and allowing them to traverse the graph repeatedly, we get the optimum path. Thus, we actually mimic the co-operative nature of ants in a colony to solve optimization problems.
Now, lets move on to the problem at hand. As the title suggests, today, we are going to solve one of the most famous an computationally complex problems in computer science – The Traveling Salesman Problem. The Traveling Salesman problem is modeled in the following way:

There are N cities randomly placed on a map. A salesman has to visit each city and return to the starting point such that he visits each city only once. The problem is to find the shortest path given pairwise distances between the cities.

This problem was first formulated in 1930 and is one of the most intensively studied problems in computer science and optimization. The problem is NP hard in nature. We will discuss NP hard and other such classifications in a future post. It has got several exact and heuristic solutions, so problems with large numbers of cities can be solved. Ant colony Algorithm is one of the methods of finding a solution.

The following source code has been mostly sourced from M.Tim Jones’ AI Application Programming (1st Edition) by Dreamtech. I have made slight modifications to the code and also written a python script for visualizing the problem.

Let us start with the code: ( The code is an adaptation the sourcecode found in Ch. 4 –  Ant Algorithms of AI Application Programming, 1st edition by M.Tim Jones. )

Let us first define the parameters required for the Ant Colony Algorithms.

#define MAX_CITIES 30
#define MAX_DIST 100
#define MAX_TOUR (MAX_CITIES * MAX_DIST)
#define MAX_ANTS 30

MAX_CITIES refers to the maximum number of cities that we have on the map. We also define the maximum pairwise distance between the cities to be 100. Thus, worst case scenario, modeled by MAX_TOUR represents the tour in which the ant has to travel through MAX_CITIES*MAX_DIST. Here, we also define the no. of ants to be 30 i.e. exactly equal to the no. of cities on the map. It has been determined experimentally that having the no. of ants equal to the no. of cities gives the optimum solution in less time than having more ants in the system.

Now, we create the environment fro the Traveling Salesman Problem (TSP).

//Initial Definiton of the problem
struct cityType{
int x,y;
};

struct antType{

int curCity, nextCity, pathIndex;
int tabu[MAX_CITIES];
int path[MAX_CITIES];
double tourLength;
};

Here, we have defined a struct cityType which contains the members – x and y. These represent the co-ordinates of the location of the city. We follow this by defining the struct antType which contains the members necessary for an ant to function. The integer values of curCity and nextCity represent the indices of the current city and the next city to be visited on the graph. The array, tabu refers to the list of cities already traversed by an Ant whereas path[] stores the path through which the Ant has traveled. The value tourLength is the distance through which the ant has traveled as of the tour.

Now that we have set up the basic setup of the problem as well as created the ants required for simulation, we will move on to defining the parameters by which the algorithm will converge to a solution.

#define ALPHA 1.0
#define BETA 5.0 //This parameter raises the weight of distance over pheromone
#define RHO 0.5 //Evapouration rate
#define QVAL 100
#define MAX_TOURS 20
#define MAX_TIME (MAX_TOURS * MAX_CITIES)
#define INIT_PHER (1.0/MAX_CITIES)

The parameters, ALPHA and BETA are the exponents of the heuristic functions defining the pheromone deposition on the edges. ALPHA represents the importance of the trail whereas BETA defines the relative importance of visibility. Visibility here is modeled by the inverse of distance between two nodes. RHO represents the factor which defines the evaporation rate of the pheromones on the edges. QVAL is the constant used in the formula

We also declare the no. of maximum tours that the ant undertakes as 20. We also have declared a maximum time constraint in case the algorithm is not able to converge to a solution. We then weigh the edges with pheromones with the value of (1/MAX_CITIES).

//runtime Structures and global variables

cityType cities[MAX_CITIES];
antType ants[MAX_ANTS];

double dist[MAX_CITIES][MAX_CITIES];

double phero[MAX_CITIES][MAX_CITIES];

double best=(double)MAX_TOUR;
int bestIndex;

Now, we declare an array of cityType structs which will contain be defining the nodes of our graph. We also create an array of Ants with the size equal to that of cities. The double dimensional array dist records the pairwise distances between the cities whereas the array phero records the pheromone levels on each edge between the cities. The variable best is used as a control variable for recording the best tour after every iteration.

void init()
{
	int from,to,ant;

	//creating cities

	for(from = 0; from < MAX_CITIES; from++)
	{
		//randomly place cities

		cities[from].x = rand()%MAX_DIST;

		cities[from].y = rand()%MAX_DIST;
		//printf("\n %d %d",cities[from].x, cities[from].y);
		for(to=0;to<MAX_CITIES;to++)
		{
			dist[from][to] = 0.0;
			phero[from][to] = INIT_PHER;
		}
	}

	//computing distance

	for(from = 0; from < MAX_CITIES; from++)
	{
		for( to =0; to < MAX_CITIES; to++)
		{
			if(to!=from && dist[from][to]==0.0)
			{
				int xd = pow( abs(cities[from].x - cities[to].x), 2);
				int yd = pow( abs(cities[from].y - cities[to].y), 2);

				dist[from][to] = sqrt(xd + yd);
				dist[to][from] = dist[from][to];

			}
		}
	}

	//initializing the ANTs

	to = 0;
	for( ant = 0; ant < MAX_ANTS; ant++)
	{
		if(to == MAX_CITIES)
			to=0;

		ants[ant].curCity = to++;

		for(from = 0; from < MAX_CITIES; from++)
		{
			ants[ant].tabu[from] = 0;
			ants[ant].path[from] = -1;
		}

		ants[ant].pathIndex = 1;
		ants[ant].path[0] = ants[ant].curCity;
		ants[ant].nextCity = -1;
		ants[ant].tourLength = 0;

		//loading first city into tabu list

		ants[ant].tabu[ants[ant].curCity] =1;

	}
}

The above snippet of code is used to initialize the entire graph and the Ants that we are going to use to traverse the graph. We basically allot random co-ordinates to the cities and then compute the pairwise distances between them. Also, we initialize the ants with an Ant on each node of the graph. We also initialize the rest of the properties of each Ant including the tabu list and the path. The next part of the algorithm is to restart the ants after each traversal by returning them to their original positions.

//reinitialize all ants and redistribute them
void restartAnts()
{
	int ant,i,to=0;

	for(ant = 0; ant<MAX_ANTS; ant++)
	{
		if(ants[ant].tourLength < best)
		{
			best = ants[ant].tourLength;
			bestIndex = ant;
		}

		ants[ant].nextCity = -1;
		ants[ant].tourLength = 0.0;

		for(i=0;i<MAX_CITIES;i++)
		{
			ants[ant].tabu[i] = 0;
			ants[ant].path[i] = -1;
		}

		if(to == MAX_CITIES)
			to=0;

		ants[ant].curCity = to++;

		ants[ant].pathIndex = 1;
		ants[ant].path[0] = ants[ant].curCity;

		ants[ant].tabu[ants[ant].curCity] = 1;
	}
}

We also store the best path and the value of the best path in the global variables that we have defined.

double antProduct(int from, int to)
{
	return(( pow( phero[from][to], ALPHA) * pow( (1.0/ dist[from][to]), BETA)));
}

int selectNextCity( int ant )
{
	int from, to;
	double denom = 0.0;

	from=ants[ant].curCity;

	for(to=0;to<MAX_CITIES;to++) 	{ 		if(ants[ant].tabu[to] == 0) 		{ 			denom += antProduct( from, to ); 		} 	} 	 	assert(denom != 0.0); 	 	do 	{ 		double p; 		to++; 		 		if(to >= MAX_CITIES)
			to=0;
		if(ants[ant].tabu[to] == 0)
		{
			p = antProduct(from,to)/denom;

			//printf("\n%lf %lf", (double)rand()/RAND_MAX,p);
			double x = ((double)rand()/RAND_MAX);
			if(x < p)
			{
				//printf("%lf %lf Yo!",p,x);

				break;
			}
		}
	}while(1);

	return to;
}

This set of functions is used in the process of selecting the next edge to traverse. Each ant in the environment selects the next edge in accordance with the equation discussed in the last post. Here we have a simple reciprocal function of distance as a measure of visibility as well as the actual value of the pheromone itself as the function to represent pheromone. There has been extensive research in this domain such that these functions may be replaced by other suitable heuristic functions. For the simple TSP that we are going to solve, the functions thatwe use prove to be a good enough measure.

Now comes the actual process of simulating an ant colony. The following function is an example of simulating the ant colony to solve the TSP.


int simulateAnts()
{
	int k;
	int moving = 0;

	for(k=0; k<MAX_ANTS; k++)
	{
		//checking if there are any more cities to visit

		if( ants[k].pathIndex < MAX_CITIES ) 		{ 			ants[k].nextCity = selectNextCity(k); 			ants[k].tabu[ants[k].nextCity] = 1; 			ants[k].path[ants[k].pathIndex++] = ants[k].nextCity; 			 			ants[k].tourLength += dist[ants[k].curCity][ants[k].nextCity]; 			 			//handle last case->last city to first

			if(ants[k].pathIndex == MAX_CITIES)
			{
				ants[k].tourLength += dist[ants[k].path[MAX_CITIES -1]][ants[k].path[0]];
			}

			ants[k].curCity = ants[k].nextCity;
			moving++;

		}
	}

	return moving;
}

//Updating trails

void updateTrails()
{
	int from,to,i,ant;

	//Pheromone Evaporation

	for(from=0; from<MAX_CITIES;from++)
	{
		for(to=0;to<MAX_CITIES;to++)
		{
			if(from!=to)
			{
				phero[from][to] *=( 1.0 - RHO);

				if(phero[from][to]<0.0)
				{
					phero[from][to] = INIT_PHER;
				}
			}
		}
	}

	//Add new pheromone to the trails

	for(ant=0;ant<MAX_ANTS;ant++)
	{
		for(i=0;i<MAX_CITIES;i++)
		{
			if( i < MAX_CITIES-1 )
			{
				from = ants[ant].path[i];
				to = ants[ant].path[i+1];
			}
			else
			{
				from = ants[ant].path[i];
				to = ants[ant].path[0];
			}

			phero[from][to] += (QVAL/ ants[ant].tourLength);
			phero[to][from] = phero[from][to];

		}
	}

	for (from=0; from < MAX_CITIES;from++)
	{
		for( to=0; to<MAX_CITIES; to++)
		{
			phero[from][to] *= RHO;
		}
	}

}

The next snippet of code is simply a helper function which emits all the data into atext file so as a python script can present it in a better and easily understandable form.

void emitDataFile(int bestIndex)
{
	ofstream f1;
	f1.open("Data.txt");
	antType antBest;
	antBest = ants[bestIndex];
	//f1<<antBest.curCity<<" "<<antBest.tourLength<<"\n";
	int i;
	for(i=0;i<MAX_CITIES;i++)
	{
		f1<<antBest.path[i]<<" ";
	}

	f1.close();

	f1.open("city_data.txt");
	for(i=0;i<MAX_CITIES;i++)
	{
		f1<<cities[i].x<<" "<<cities[i].y<<"\n";
	}
	f1.close();

}

Now let us move on to the main()

int main()
{
	int curTime = 0;
	cout<<"MaxTime="<<MAX_TIME;

	srand(time(NULL));

	init();

	while( curTime++ < MAX_TIME)
	{
		if( simulateAnts() == 0)
		{
			updateTrails();

			if(curTime != MAX_TIME)
				restartAnts();

			cout<<"\nTime is "<<curTime<<"("<<best<<")";

		}
	}

	cout<<"\nBest tour = "<<best<<endl;

	emitDataFile(bestIndex);

	return 0;
}

Here we simply run the simulations and keep on updating trails and restarting the simulation with the updated trails until the no. of iterations does not exceed MAX_TIME. This ensures thatthe simulation terminates regardless of whether the ANT agents converge onto a solution or not after a set no. of iterations.

This is the primary code for the Ant Algorithm solution for the Traveling Salesman Problem in C++. Now, I have written a small python script which plots the data using matplotlib and creates a beautiful graph for us to view.

from numpy import *
import matplotlib.pyplot as plt

def city_read():
	x,y=loadtxt('city_data.txt', unpack = True)
	order=loadtxt('Data.txt')
	print order
	print x
	print y
	'''order1[]
	x1[]
	y1[]'''
	x1=[]
	y1=[]
	for i in order:
		x1.append(x[i])
		y1.append(y[i])
	x1.append(x[order[0]])
	y1.append(y[order[0]])
	plt.plot(x1,y1, marker='x', linestyle = '--', color = 'r')
	for i in range(len(order)):
		plt.text(x1[i],y1[i],order[i])
	plt.show()

def main():
	city_read()

if __name__=="__main__":
	main()

The script outputs an easy to read graph which pinpoints the cities on a graph and highlights the optimum path as outputted by the ant algorithm. Here are a few output images:

A solved TSP

ANTs plot out a path

Output for a randomly generated city map

 

P.S.: You can find the above code as well as several other implementations at my git:

https://github.com/ameya005/AntColonyAlgorithms

Ant Colony Algorithms

Hey guys, Thanks for the great response on my previous article.

Today, I will be writing about a brilliant algorithm called Ant Colony Algorithm. Ant Colony Algorithm or Ant Algorithm as it is popularly known as, is basically an optimization algorithm based on swarm intelligence. It mimics the behavior of an ant colony foraging for food. This algorithm was first proposed by Marco Dorigo in 1992 as his PhD. thesis. It is a part of the amazing world of swarm intelligence wherein a swarm of low ability agents are used to solve complex problems.

Ants are perhaps one of the most amazing creatures in this world. They are nearly blind but still mange to navigate through difficult terrain and find their way to food and back. Not only that, they can lift up to10 times their body weight and perhaps one of the sturdiest creatures of the insect world. The major reason of their success in the battle for survival is the fact that they co=operate with each other. The survival of the ant colony is foremost rather than survival of a single ant. It is this co-operation and communication that we attempt to model through ant algorithms.

 

Now, ants have an amazing method of navigation. They use pheromones to communicate the optimal path to any point to other ants. This works in the following way.

  1. An ant finds a food source.
  2. It lays down a trail of pheromones which can be detected by other ants upon the path from the food source to the storage area
  3. Other ants also reach the food source through various paths.
  4. Now, based on the time taken to complete each trip to and fro, the shortest path will have the most pheromone as it has been traversed the most.
  5. As the amount of pheromone on each path dictates the probability of that path being taken, more and more ants will be traveling on the optimal path. Thus, using several ants, the optimal path to a food source is obtained.

We attempt to mimic this behavior by using agents that we call ants. To solve a problem using Ant algorithms, we model it as a fully connected bidirectional graph. You can read up more on graphs and graph theory here. On each node, we place an ant and then ask it to traverse the other nodes on the graph thus completing a trip. The constraint on the ant is that it must visit each node only once. As it travels, we calculate the pheromone levels on that path based on its length. Upon the completion of each trip( known as a Tour), we replace the ant on a node and send it on its way again. But each time, it reaches a node, the probability of it taking the next edge is given by the amount of pheromone on that edge. Thus, in the end, through traversing the graph, we find out the most optimal path.

For example, lets take a situation with 2 ants and a food source. Now Ant1,  who will call Han takes a path with 2L distance to the food source whereas Ant2, Luke, takes a path with distance L. Now both lay down small amounts of pheromone as they pass. Now Luke completes the trip to the food and back in time T, while Han takes 2T, assuming both travel at the same speed. He takes the food and returns back. And then proceeds on his second trip. While he starts of his second lap, Han is still just at the food. So, when Han returns back, Luke will have completed two trips. Thus, there is approximately twice the mount of pheromone on the Luke’s path as compared to Han’s path. So, next time, Han will be more probable to taking Luke’s path based on simply the amount of pheromones.
So, how do we calculate the probability of an ant taking the path? Before that, we also add another factor to the situation- that the pheromone evaporates as time goes on. Thus, paths which are poor are discarded more easily. Now, we need to describe the probability equation governing the movement of ants.

While an ant has not completed a tour i.e. it has not visited all the nodes, the probability of the next edge to take is given by the following equation:

Here \tau(r,u) is the intensity of pheromone on the edge between the nodes r and u. \tau(r,u) is actually a heuristic function which represents the inverse of the distance between the nodes, \alpha is the weight for the pheromone, and \beta is the weight for the distance of the edge which can be modeled after visibility. k is the no. of edges not traversed yet. This formula can be said to be the ratio of the weight of an edge to the sum of the weights of all untraveled edges. Thus, this will give us a probability based on the amount of pheromone deposited and the distance of the edge. Both of these values will help us in converging to an optimal value.

Now, we calculate the pheromone left on each edge only at the end of each ant tour. We do this using the following formula:

Here Q is a constant and L^{k}(t) is the total length of the tour. We then use this value to calculate the total pheromone value due to a single ant after a tour.

We have to note that this operation is carried out only after the tour is completed as the pheromone levels are inversely proportional to the length of the tour. Constant \rho is a constant between 0 and 1.

As I have mentioned before, we consider pheromone evaporation as a way to discard poor paths. The evaporation is modeled by a simple equation

Here, we are using the inverse coefficient of the weight of the change in pheromone levels on a n edge to model evaporation. This proves to be an efficient way to discard edges which are not optimal.

As each ant completes it’s tour, we update the edges based on the equations above. Then, we restart the entire process with the ants being placed on nodes once more and then allowed to move according to the updates edges. hence, we can see that more and more ants will converge to a single path. The no. of iterations can be controlled so as the algorithm ends when there is no change in the value of the shortest length for a few iterations.

Thus, we see how a complex natural system is modeled and used to solve problems. A few applications of Ant Algorithms include

  • Network Routing
  • Graph Coloring
  • Vehicle Routing
  • Path Optimization
  • Swarm Robotics

This concludes this post. In the next post, we will use the discussed Ant algorithm to solve a famous computational problem – The Traveling Salesman Problem.

Simulated Annealing

Recently, I have been fascinated by algorithms generally used for AI. Simulated Annealing is one of the basic algorithms for finding solutions a constrained problem. I have been greatly helped along this path by a book : Artificial Intelligence Application Programming(1st edition) by M. Tim Jones.

Annealing is a process in which a metal is first heated to a high temperature and then slowly cooled down resulting in formation of large, ordered crystal structures. The output of this process is dependent on the time taken for the cooling. Faster cooling times result in formation of brittle, fragile structures while slow cooling results in the formation of large stable crystals.

Simulated annealing as the name suggests, is creating an algorithm which mimics this cooling process. Basically, several problems of varied nature can be solved by creating an analogy with the annealing process. Many problems can thus be equated to the annealing process. For example, given a minimization problem, we can equate the ‘Energy’ of the solution to the value of the objective function and then randomly tweak the values within the given constraints to find the minimum solution.

The algorithm for simulated annealing is as follows:

1. Create an initial solution.

2. Assess it’s value

3. Randomly tweak the solution.

4. Assess the new solution

5.Check acceptance criteria

6. Reduce Temperature

7. Go to step 3

When the temperature reaches a certain predetermined value, the current solution which satisfies all criteria is outputted as the best solution.

To model this, let us assume that the lower the ‘energy’ of a solution, the better it is. Now for the algorithm to converge on a solution, we must propagate the solutions having lower values of energy. Thus we use the acceptance criteria for that. So, if the new solution is lower in energy than the current solution, we move on to reducing the temperature. But, if the working solution is worse, then we have an acceptance criteria which we can follow. In simulated annealing, we use an acceptance criteria based on the laws of thermodynamics.

The following example is a basic example of simulated annealing.

N-Queens problem:

A very famous problem, it was first solved by Carl Friedrich Gauss in 1850 by trial and error. Since then, several methods have been used including divide-and-conquer, depth first, genetic algorithms etc. The problem in it’s conception is very simple. We have an N x N chess board and we have N queens. Now, we have to place the queens such that there is no conflict between any two queens as per the familiar rules of chess. As a queen can move diagonally, vertically and horizontally, we have to ensure that for an acceptable solution, there must be only one queen in each column and row and it’s diagonals must not have any other queen.

The formulation of this problem can be done as follows:

Lets take an 1 x N matrix. The index of the matrix is the row index of the position of the queen while the value at that index is the column index of the queen. Thus we have defined the position of each of the N – queens. Now, we randomly allocate each of the queens on the board such that no 2 queens occupy the same row or column. This can be easily ensured by checking the condition that the value at each index is not equal to any before it.

We define the energy of each solution as the no. of conflicts i.e. the no. of queens threatening each other. We also define several macros which which help us in the coding process


 #define MAX_LENGTH 30
 #define INIT_TEMP 30.0
 #define FIN_TEMP 0.5
 #define ALPHA 0.98
 #define STEPS 100

MAX_LENGTH is the no. of queens i.e. N.

The temperature parameter basically defines the region in which the solution is being searched for. As the temperature decreases, the range of the solution space also decreases thus resulting in an optimum solution. The higher the temperature, the more time it will take for convergence but also there is a greater probability of finding a solution. Now, the final temperature cannot be zero as the value of the exponent then cannot be calculated. Thus, we try and take the value as close to zero as possible but at the cost of time. I tried several values and found out that at 0.5, the algorithm converges to a solution for even large N’s in a fairly decent amount of time.

ALPHA is the value of the constant in the linear function

T_{t+1} = \alpha T_{t}

This function can be any decreasing function of T including non linear functions. I used the above one because it is the easiest to analyse. Now, at each temperature, we allow for 100 iterations for a best solution to emerge. This solution is then propagated to the next temperature. This can be imagined as moving up or down a hill until we find a ditch which represents over optimum value. Then we can start following that ditch until we reach the actual global minimum. To ensure, we stay within the constraints of the ditch, we decrease the temperature and keep decreasing it with every 100 steps until at last we find a solution.

Thus we can see that STEPS must be sufficiently large while ALPHA must be as close to 1 as possible so as to generate the optimal solution. But we must also keep in mind the timin constraints for the program to find a solution.

Now let us encode the problem:

We create a struct for each solution in which we create an array representing the solution and the energy associated with it.

typedef struct {
solutionType solution;
float energy;
} memberType;

Now, we create several functions to help us with the program

void tweakSolution( memberType *member)
{
int temp, x, y;

x=rand() % MAX_LENGTH;

do
{
y=rand() % MAX_LENGTH;
}while(x == y );

temp=member->solution[x];
member->solution[x]=member->solution[y];
member->solution[y]=temp;

}

This is a simple function which will help us create a randomly perturbed solution.


void initSolution( memberType *member)
{
int i;

for(i=0;i<MAX_LENGTH;i++)
{
member->solution[i] = i;
}

for(i=0;i<MAX_LENGTH;i++)
{
tweakSolution(member);
}

}

This function initialises the first solution.

void computeEnergy(memberType *member)
{
int i,j,x,y,tempx,tempy;

char board[MAX_LENGTH][MAX_LENGTH];

int conflicts;

const int dx[4] = { -1,1,-1,1 };

const int dy[4] = { -1, 1, 1 , -1};

bzero((void*)board, MAX_LENGTH*MAX_LENGTH);

for(i=0;i<MAX_LENGTH; i++)
{
board[i][member->solution[i]] = 'Q';
}

conflicts = 0;
for(i=0;i<MAX_LENGTH;i++)
{
x=i;
y=member->solution[i];

for(j=0;j<4;j++)
{
tempx=x;
tempy=y;

while(1)
{
tempx+=dx[j];
tempy+=dy[j];

if(tempx<0 || tempx>=MAX_LENGTH || tempy<0 || tempy >= MAX_LENGTH)
{
break;
}

if(board[tempx][tempy] == 'Q' )
conflicts++;
}
}
}

member->energy = (float) conflicts;
}

This function is used to compute the energy of each solution. We define energy as the no. of conflicts. Since, we have ensured that while tweaking, we avoid placing 2 queens on the same row or column, we can easily reduce the complexity of finding the no. of conflicts to checking the diagonals of each queen.

There are a few more functions for just simplifying the coding process and emitting the putput for better readbility and understanding

void copySolution( memberType *dest, memberType *src)
{
int i;

for(i=0;i<MAX_LENGTH;i++)
{
dest->solution[i] = src->solution[i];
}

dest->energy = src->energy;
}

void emitSolution( memberType *member)
{
char board[MAX_LENGTH][MAX_LENGTH];

int x,y;
bzero((void*)board, MAX_LENGTH*MAX_LENGTH);

for(x=0;x<MAX_LENGTH;x++)
{
board[x][member->solution[x]] = 'Q';
}

printf("board: \n");

for(y=0;y<MAX_LENGTH;y++)
{
for(x=0;x<MAX_LENGTH;x++)
{
if(board[x][y] == 'Q')
printf("Q ");
else
printf(". ");
}
printf("\n");

}
printf("\n\n");

}

Now, we implement the actual simulated annealing algorithm in the main()

int main()
{
int timer = 0,step,solution=0, useNew, accepted;

float temp = INIT_TEMP;
memberType current,working,best;

FILE *fp;

fp=fopen("stats.txt", "w");

srand(time(NULL));

initSolution(&current);
computeEnergy(&current);

best.energy=100.0;

copySolution(&working, &current);

while( temp > FIN_TEMP )
{
printf("\n Temperature = %f", temp);

accepted = 0;

/* Monte Carlo step*/

for( step = 0; step < STEPS; step++);
{
useNew=0;

tweakSolution(&working);
computeEnergy(&working);

if(working.energy <= current.energy)
{
useNew = 1;
}
else
{
float test = rand() % 1;
float delta = working.energy - current.energy;
float calc = exp(-delta/temp);

if(calc > test)
{
accepted++;
useNew = 1;
}
}
}

if(useNew)
{
useNew = 0;
copySolution(&current, &working);

if(current.energy < best.energy)
{
copySolution(&best, &current);
solution = 1;
}

else
{
copySolution(&working, &current);
}

}

fprintf(fp, "%d %f %f %d \n", timer++, temp, best.energy, accepted);
printf("Best Energy = %f\n", best.energy);

temp *= ALPHA;
}

fclose(fp);

if(solution)
{
emitSolution(&best);
}

return 0;
}

In the main program, we have created two loops. The outer loop deals with the temperature and reduces the temperature after every STEPS steps. The inner loop on the other hand is an implementation of a Metropolis algorithm. I will surely discuss the Metropolis algorithm in detail in some other post. But, for now, what it basically does is that it picks up a tweaked solution and computes it’s energy. If the energy of the solution is lower than the energy of the current solution, then it replaces the current solution by the tweaked solution. But if the energy is higher,  it calculates the difference between the current and the tweaked energy, calculates it’s probability as according to Formula 1.1 and then compares it to a randomly generated threshold value.

The reason for doing this is that if the algorithm encounters a local optimum, it may get stuck in the same place thus causing the algorithm to fail. Due to the Metropolis Algorithm, we avoid this by allowing the solution to take values which may not be the best for that iteration.

The solution that we have thus far obtained is then compared to the best solution we have so far. If it’s energy is lower, we copy the new solution into the best one at the end of each temperature change.

The program will terminate when we reach our chosen final temperature thus giving us the optimal solution.

Now that we know ‘Simulated Annealing’, we can apply it to solve problems in various domains,

  • Path generation
  • Transportation problems
  • Image Reconstruction
  • Circuit layout

Here’s a screen-shot of the output.

Much of the source code is provided in the above mentioned book. The following post will proceed on with Ant Algorithms. Thanks for reading and please contact me for any questions.