Tuesday, January 22, 2008

ECJ - tutorial 1

I set up the Evolutionary Computation in Java (ECJ) system to run under Eclipse. Here is a brief report on using it to run the first tutorial provided with the system. This is the same tutorial covered in the YouTube videos mentioned previously on this blog.

The tutorial description assumes that you are typing in every line of the tutorial1.params file, but since it is provided I went straight to tweaking the different parameters.

The optimization problem in this tutorial is trivial to state. The genome is a set of unrelated bits which are counted, so maximum fitness occurs when all bits are 1, hence the name MaxOnes. As trivial as that sounds, a lot of basic research in GAs is conducted on this and similarly simple functions.

Out of the box, the parameters of interest are

  • crossover probability = 1.0 (we will always cross to get a new individual)

  • mutation probability = 0.01 (every bit has a 1% chance of changing)

  • genome size = 40 bits

  • population size = 40

  • generations = 100

With this setup, we get

Generation 32
Found Ideal Individual

It took 32*40=1280 function evaluations to find the ideal individual of all ones.

Upping the ante, if we double the genome size to 80

Generation 87
Found Ideal Individual

We found it, but doubling the size of the problem more than doubled the effort to solve it. It almost tripled it. That is disturbing.

Doubling once again to a genome of 160 bits, and the system cannot find the ideal individual, even in 1000 generations. Even if the population size is quadrupled to 160, the system fails.

What has happened? Have we reached the edge of evolution already? Is this all that GAs are capable of, solving trivial problems?

Happily, the answer is no. We are very far from the edge of evolution (if there is such a thing), but we are at the beginning of understanding how GAs work.

One of the great themes of search and optimization algorithms is the tradeoff between exploration and exploitation. When do we look in new areas for an optimum, and when do we try to find the best value in the local area of a solution we already know is "pretty good"?

In GA work, the operators of crossover and mutation are often characterized as typifying exploration and exploitation, respectively. I think this is an acceptable characterization wrt mutation, though it isn't the greatest way to think about genetic operators. Another way to think about mutation is to look at it as a kind of heat source, as in simulated annealing.

What happens if we scale back the mutation rate to 0.001 and go back to our original population size of 40?

Generation 219
Found Ideal Individual

Very interesting! What was happening in our earlier runs? The mutation rate was 1 out of 100. When the genome size was less than 100 bits, individuals could get crossed and then never suffer mutation on their way into the new population. But when the genome size went to 160, we were almost guaranteeing that every individual would suffer, on average, 1.6 bits changed. So the population was constantly dancing around the optimum, but never hitting it exactly.

If there is a final lesson to be drawn from this small example, it is that GAs are not about parameter tweaking. There is always a coupling between the probem and the parameters that needs to be understood.

1 comment:

iOussa said...

thinks for the tetorial