Thursday, January 31, 2008

ECJ - building a BinInt problem

I've gotten a little farther with my learning of ECJ.

The tutorial 1 provided with ECJ uses the MaxOnes as a sample problem. MaxOnes simply adds up the number of "1" bits in the genome of an individual to create a fitness for that individual.

BinInt changes the problem by reinterpreting the genome as a binary integer. In my version, I've written it so that you can cut the genome up into several integers. For example, a genome of 150 bits can be interpreted as 30 5-bit integers, 5 30-bit integers, or even 16 9-bit integers with one 6-bit integer.

BinInt is a class of problems that imposes two new kinds of difficulty on the GA. First, BinInt shows scaling. Different bits are worth exponentially more than other bits. Secondly, bits are related to each other to form sub-problems. All the bits of a particular sub-problem contribute to the solution of that part of the overall problem.

Scaling in particular challenges a GA. The large fitness contribution from just a few bits can cause individuals lucky enough to have those bits turned on early in the run to quickly dominate the population, even before the population has found the best arrangement for the other bits. This problem is called premature convergence.

I had fun building the BinInt problem in ECJ. It was straightforward to modify the MaxOnes problem fitness evaluation function. The learning came in when I wanted to create a custom parameter to control the sizes of the integers. I invoked the OTSOG (On The Shoulders Of Giants) principle, and asked for help from the ECJ discussion list, with very helpful and immediate answers. Thanks guys!

Thursday, January 24, 2008

Sketch of comics hero team

Oh yeah, I like comics and manga, too. Sorry I forgot to tell you earlier.

I visited an Oscar site yesterday, and was happy to see that Cate Blanchett was nominated for Elizabeth: The Golden Age. I feel really stupid for not realizing that this movie is a sequel.

Anyway, it got me thinking about a hero team set in the Elizabethean Age.

In the backstory, Elizabeth really is the Faerie Queen. The last test of a new Queen of Faerie is to seduce a mortal lover, but Elizabeth instead falls for the mortal and renounces her throne in Faerie. This choice angers a large number of powerful spirits, who were planning on using Elizabeth's control of two thrones (Faerie and England) to transform England into the New Atlantis. (Not the New Atlantis of Bacon, sorry.)

Our heroes:
Gloriana - Queen Elizabeth in disguise. Or is Elizabeth the disguise, and Gloriana her real identity?
Sir Francis Walsingham - the spymaster of Elizabeth in real life. Here, a bit more "hands on".
Dr. John Dee - a link between the scientific and the supernatural. Capable of a bit of wizardry with the help of the right artifacts.
The Golem of Prague - on loan to Dee from the Maharal of Prague, who made him. Think Ben Grimm with a Yiddish accent.
Thomas Phelippes - asistant of Walsingham. Inventor, cryptographer, etc. The Q to Walsingham's M. Deciphered the Book of Soyga for Dr Dee.

Our villains:
Phillip II of Spain - wants to conquer England and create his own version of the New Atlantis.
Duetia - beautiful faery, would be Queen of Faery if she could eliminate Gloriana.
John Kelley/Talbot - worked with Dr Dee but has fallen under dark influences.
Quetztlcoatl - the Aztec god, making a deal to stay alive in the New World's New World Order.
Talus - a mechanical man/animated suit of armor. Sometimes comic relief for being too literal.

I know you're asking already, who would win a Golem vs. Talus smackdown. Is the occult science of Dr Dee any match for the pure pagan fury of a God? Can Walsingham outwit Phillip? Can Gloriana save England from drowning beneath the waves if Duetia has her way? And what about Naomi?

Find out in our next issue!

Tuesday, January 22, 2008

ECJ - tutorial 1

I set up the Evolutionary Computation in Java (ECJ) system to run under Eclipse. Here is a brief report on using it to run the first tutorial provided with the system. This is the same tutorial covered in the YouTube videos mentioned previously on this blog.

The tutorial description assumes that you are typing in every line of the tutorial1.params file, but since it is provided I went straight to tweaking the different parameters.

The optimization problem in this tutorial is trivial to state. The genome is a set of unrelated bits which are counted, so maximum fitness occurs when all bits are 1, hence the name MaxOnes. As trivial as that sounds, a lot of basic research in GAs is conducted on this and similarly simple functions.

Out of the box, the parameters of interest are

  • crossover probability = 1.0 (we will always cross to get a new individual)

  • mutation probability = 0.01 (every bit has a 1% chance of changing)

  • genome size = 40 bits

  • population size = 40

  • generations = 100

With this setup, we get

Generation 32
Found Ideal Individual

It took 32*40=1280 function evaluations to find the ideal individual of all ones.

Upping the ante, if we double the genome size to 80

Generation 87
Found Ideal Individual

We found it, but doubling the size of the problem more than doubled the effort to solve it. It almost tripled it. That is disturbing.

Doubling once again to a genome of 160 bits, and the system cannot find the ideal individual, even in 1000 generations. Even if the population size is quadrupled to 160, the system fails.

What has happened? Have we reached the edge of evolution already? Is this all that GAs are capable of, solving trivial problems?

Happily, the answer is no. We are very far from the edge of evolution (if there is such a thing), but we are at the beginning of understanding how GAs work.

One of the great themes of search and optimization algorithms is the tradeoff between exploration and exploitation. When do we look in new areas for an optimum, and when do we try to find the best value in the local area of a solution we already know is "pretty good"?

In GA work, the operators of crossover and mutation are often characterized as typifying exploration and exploitation, respectively. I think this is an acceptable characterization wrt mutation, though it isn't the greatest way to think about genetic operators. Another way to think about mutation is to look at it as a kind of heat source, as in simulated annealing.

What happens if we scale back the mutation rate to 0.001 and go back to our original population size of 40?

Generation 219
Found Ideal Individual

Very interesting! What was happening in our earlier runs? The mutation rate was 1 out of 100. When the genome size was less than 100 bits, individuals could get crossed and then never suffer mutation on their way into the new population. But when the genome size went to 160, we were almost guaranteeing that every individual would suffer, on average, 1.6 bits changed. So the population was constantly dancing around the optimum, but never hitting it exactly.

If there is a final lesson to be drawn from this small example, it is that GAs are not about parameter tweaking. There is always a coupling between the probem and the parameters that needs to be understood.

Chipping Away at the Wierdness

Over at the Mindful Hack, Denyse O'Leary defends the study of prayer thusly

Anyone familiar with the placebo effect would consider $2.3 million to
study the power of prayer money well spent.

How should we understand this? Does Denyse really think God is nothing more than the placebo effect? If intercessionary prayer "works", there should be some measurable difference between the placebo effect and people who are prayed for.

I think Denyse would like to help people believe in the objective reality of God. She is, after all, not a journalist but an apologist. This kind of post does not advance that agenda.

However, it is interesting when looked at in the context of many of Denyse's posts on Mindful Hack, Uncommon Descent, and her Design of Life blog. In posts where Denyse tries to channel Carl Zimmer to actually write about science, she winds up writing sentences that carry assumptions such as the reality of deep time, an old earth, and common descent. Is she with Michael Behe on this, and against most of her crank readership?

Also more out of the closet recently is DaveScot, Uncommon Descent's resident dominatrix. Dave fessed up to the same trio in a few posts over there recently, but has saved his crank street credibility by backing some form of front loaded evolution.

Are we seeing some of the loudest cheerleaders of ID talking themselves down? One can only hope.

A Model of Error

Over at UD, math (sic) professor Granville Sewell resurrects one of the silliest posts of 2006, Gil Dodgen's misapprehension of what makes a model. No, the errors in our OS are not part of the model, any more than they are part of your PDE solver or Dr Dembski's MESA or Dr Marks' EIL work.

It is however true that if you are running intense numerical calculations, you need to take into account the quality of your (pseudo)random number generator, your floating point hardware's aptitude for error due to heat, cosmic rays, bad programming by Intel, etc. Taking these things into account is very different from intending them to be part of the model system. No one is going to be impressed by results that need to be run on Windows ME with a lump of uranium lying on the desk next to the CPU.

But it is from this support the Dr Sewell makes a sweeping statement, "Unintelligent forces simply can’t do intelligent things..." Sorry, Dr Sewell. In the words of the prophet, pwned! May I be the first to introduce you to the Humies.

If you're not following the link, I'll just summarize that the Humies are cash based awards for achievement equal or better than human at some intellectual task, by an EA based system. That we can give away awards like this is a testament to the power that GA, GP, etc have already achieved.

Sunday, January 20, 2008


I'm not a complete otaku, which is probably why it has taken me until now to notice this movie.

If you appreciated the way The Matrix played with "is this reality?" than you will love++ this anime. The mix of traditional cel animation and cgi was well done and the music was excellent. The playfullness of the OP (opening credits) was amazing and set the tone for the whole movie. Buy it on DVD!

Can't wait for Spring!

My recent look into science standards here in New Jersey popped up the factoid that the Garden Sate has a state dinosaur (as do several other states). Now I've found that the best place to look for fossils here is Poricy Park. Thank you NJGS!

Since you have to wade out into the stream to seive for fossils along the sandbanks, I'll have to wait for the spring before dragging the family out for the fun. This is gonna be great!

Friday, January 18, 2008

Looking back into ANAS

ANAS is John Holland's 1975 monograph, Adaptation in Natural and Artificial Systems. I went back to it recently to review the history of the thinking about linkage learning from the very beginning.

I was struck that Holland's first presentation of a simple genetic algorithm, called a reproductive plan in ANAS, is a steady state GA. In steady state GA, a single individual is replaced at a time in the population. This means that new mixes of genetic material is made available immediately for selection.

I had mis-remembered this. I thought that historically GAs started as generational models in which the entire population was replaced at once, and that steady state GAs were due to Grefenstette.

Similarly, ANAS considers the possibility of evolving algorithms, not just parameter sets. This is the basic distinction between genetic algorithms and genetic programming, which most people attribute to John Koza.

There is a phrase in Pirkei Avot (Ethics of the Fathers) "hufach bo v'hufach bo d'kulla bo." Turn it over, turn it over, because everything is in it. I think it applies just as well to basic texts in the sciences as it does to the Talmud.

Thursday, January 17, 2008


So I was watching (via PT 'natch) the slow motion train wreck of Florida school districts signing up to stupidity, consigning a generation of graduates to burger flipping and telemarketing hell.

(Btw, if they ever figure out how to outsource burger flipping to India, we are toast. That is, until they figure out how to outsource toast to India...)

So there I was, all smug in my insufferable suburban Yuppie way, when it suddenly occurred to me that the fair state of New Jersey has its own share of rural ignorance, people from one end of the state who impose blue laws on the other end, teachers in Kearny that lecture kids on their private religious beliefs, etc. Perhaps I should check out how my own state is doing?

The good news - New Jersey has some (IMHO) appropriate core content standards in place for life science education in evolution. New Jersey got an A while Kansas was getting an F- (pwned!) a couple years ago when KS was devolving into BS. The standards went into effect in 1996 and are reviewed every five years. Testing shows that even economically disadvantaged kids are scoring above 50% "proficient" at all grade levels.

The better news - we have a state dinosaur! (OK, OK even Florida has a state fossil.)

Back to smug complacency until 2011, when our standards come up for review...

Wednesday, January 16, 2008

Why Now?

In the interests of full disclosure, let me admit that I have been banned at Uncommon Descent. It is something of a badge of honor...

So one of the reasons for this blog is to provide a platform for comment and Kremlinology on the denizens of UD and the discussions there.

This time around, let us wonder at the recent post by Dr. Dembski himself. Yes, what is the predictive power of ID?

cue crickets chirping...

Well, that wraps up our program for today!

Now, for the Kremlinology portion of our show - what is the subtext of this message?

First, it is a way to say "I am important". Normally, important people don't have to broadcast that fact, so this is really telling us that Dr. Dembski's importance is falling rather than rising.

Second, it tells us that someone in the media is actually taking Darwin Day seriously. There is no other reason that anyone would take Dr. Dembski seriously enough to put him on TV (other than with Steve Colbert). Dr. Dembski is the "counterpoint" position for a Darwin program. Well, for a Fox program he might be the "point" position...

Third, it says Dr. Dembski's estimation of his chances of actually getting on the program are low. After all, actually appearing on the show is much more impressive than telling your friends that you might be invited on the show. But if this is as good as it is going to get, he's got to go with it.

Fourth, what is it with WaD and premature ejaculation? The guy cannot keep his mouth shut until the real results come in, he has to start blabbing at the very beginning. If you want other examples, look at the Polya Center and EIL fiascos. The guy knows how to kill his own prospects with ill timed PR better than anyone I know.

Tuesday, January 15, 2008

What Part of Evolution Don't You Undertand?

We're coming up on the Darwin Bicentennial, hooray! In honor of the occasion, I'll throw out my own little explanation of evolution for the sake of anyone dazed and confused by the massive amount of misinformation wandering the Internet.

Evolution is the name of a process. It is what happens when the following are all true for some period of time.
  • There is a population of individuals, and these individuals have traits.

  • The traits influence the success of the individuals in reproducing.

  • There are limited resources available for reproduction.

  • Not everyone that wants to reproduce will get the chance.

  • Chances depend on traits.

  • The traits can vary when individuals reproduce.

If we compare the population after some long period with the initial population, the frequency of the traits will have changed. Traits that helped reproduction will have increased their frequency at the expense of alternatives.

There is nothing in this that limits evolution to biology. The whole point of a field like Genetic Algorithms is that I can set up those conditions in software and evolve data. This is a very successful part of computer science, with many commercial applications.

Of course, it does happen that evolution applies to the biology that we see around us. If you don't think it does, you'll have to point out which of the above conditions doesn't hold in the real world.

Getting back into GAs

I don't remember how I discovered genetic algorithms. At some point, I got a copy of David Goldberg's book, Genetic Algorithms in Search, Optimization, and Machine Learning. It was a great book, and I had a lot of fun writing my own GA package in C (the code in the book was written in Pascal).

Goldberg's next book, The Design of Innovation is even better. The discussion of how to make GAs a principled engineering approach to solving problems, rather than a black box with too many knobs and levers, really energized my interest.

So now I'm learning the ECJ system using Eclipse. With the help of some nice YouTube tutorials on setting up ECJ with Eclipse, I'm off to the races!

Hello World

Don't expect much from this blog. I'm not huge on reguritating my life into an open diary, and I'm not the author of bons mots every day.

I'm going to try to focus the blog on the intellectual areas I care most about. Right now, this means genetic algorithms, cryptography, defending evolution from misguided attacks, XBRL, and a very small dab of politics. Arts and letters will get a passing mention. I'm interested in comics (and manga, and anime), backing into the subect from the perspective of UI design, Tufte, Scott McCloud and Takahashi Rumiko.

Thanks for reading! I hope you'll be back.