Hill Climbing

For the past year or so I have used what I have called a ‘stochastic hill climbing algorithm’ to handle most non-time sensitive optimization problems.  It has been useful particularly in parameter estimation for the car, finding the ideal power file for a cyclist over a course, and a few other types of problem.  Recently though, I realized that what I had been calling a ‘stochastic hill climbing algorithm’, wasn’t really that at all.  I’m not sure what it is, so I figured I would document it, and maybe someone would tell me.

First, hill climbing in general.  You have some surface:

In this case I used a surface with 2 peaks.  A common problem with hill climbing algorithms is getting stuck on the ‘local maxima’ or that shorter peak.  If it wasn’t already apparent, the point is to try to find the highest point on this surface.

Let’s say we start at a random point on that surface, we can check and see how high we are, but can’t really assume we started at the very top, so we have to do something.  The simplest thing to do, is to look at all the adjacent points, find one that is uphill, and head that way, until you can’t go uphill anymore.  That works well if there is only 1 peak, but in this case it is particularly susceptible to getting stuck on the lower peak.

How I solved that was to instead ‘jump’ a random distance in a random direction, and check there.  In mathematical terms, the surface shown has 3 dimensions, X, Y, and elevation.  Elevation is dependant, so we perturb the X and Y variables by some random value to create this random jump.  I multiplied each by a pseudo-random number which follows a standard normal distribution.  Since both X and Y are scaled by separate random numbers, the probability density function becomes a function of both X and Y, and becomes 2 dimensional itself.

*See note on Kurtosis

So this scaling creates a ‘random jump’.  Now at your new location on the surface, you check how high you are.  Are you higher than you were earlier?  Awesome, stay put for now.  If not, you jump back to where you were before, and assume you are headed the wrong direction.  This process reiterates for some time, and presumably will eventually converge on the true peak.  Because the X and Y can be scaled by very large numbers, no local maxima can be converged on assuming infinite run-time.

Well how does this compare to the actual ‘stochastic hill climbing algorithm’?  A stochastic hill climbing algorithm looks at all adjoining neighbors, ignores all neighbors which lead downhill, and chooses between the uphill options randomly.  This makes is susceptible to converging on local minima, but also means that it is relatively quicker than my implementation.  Here is both algorithms running on the same surface, from the same (random) starting point:

Here the stochastic hill climbing algorithm quickly and directly rises to the top of the wrong peak.  In MATLAB, it only took 7ms to come to this admittedly wrong conclusion.

Here my algorithm is shown.  The mess of green stars are the instances where the algorithm ‘jumped’ to a new spot, and found that it was lower than the previous point, so it turned back.  The correct progression is shown in red, and pretty much impossible to see due to its randomness.  This algorithm correctly identified the true maximum of the function, but it took 256ms in MATLAB, significantly more than the stochastic method.

It turns out, on this surface, the stochastic hill climbing algorithm will only identify the correct peak around 56% of the time, assuming random starting points.  Given enough time/iterations, my implementation will eventually find the true peak every time.   So I don’t know what it’s called, I assume I didn’t make it up, but that would be pretty cool.

***Note on Kurtosis:  I used a standard normal curve for the basis of my random perturbations due to the ease of implementation for now.  I do think that a distribution with a good bit more kurtosis would reduce the size of the jumps on average and result in a ‘cleaner’ and perhaps faster progression.  It is pretty apparent from the plots that the majority of guesses are along the boundaries of the surface, which is far from ideal.   Also, if starting in a corner, it may be a good idea to add some skewness to the distribution as well.

Leave a comment