Monday, May 30, 2011

Spetner, Wilf, & WEASEL

Last year, a short paper was published in the Proceedings of the National Academy of Sciences (PNAS) entitled "There's Plenty of Time For Evolution". (Link goes to arXiv version of the paper.) It wasn't the best of all possible papers on the subject, and some people seriously questioned why it got published by PNAS.

The paper, by Wilf & Ewens, attempts to address the 'bignum' argument frequently used against evolution. Bignum arguments usually run along the lines of: a functional protein enzyme is at least 150 amino acids long, but if you just tested strings of length 150 with any sequence of 20 amino acids, that is equivalent to searching through 20^150 strings. Even if the universe were filled with testing apparatus working since the Big Bang, not enough time would have passed to find a single functional protein. Therefore, evolution needs the assistance of an Intelligent Designer to nudge things in the right direction, or perhaps create everything already working perfectly.

Bignum arguments have been around a long time, and they were famously taken down by Richard Dawkins in his popular book, The Blind Watchmaker. In that book, Dawkins shows that cumulative selection is the natural process which allows evolution to proceed faster than the random search model. Dawkins does this with a little computer program that has come to be known as WEASEL.

The WEASEL algorithm was simple. Generate a population of random strings. Measure their fitness. In WEASEL, this was done by comparison to a fixed string. The fittest of the population is the only one allowed to reproduce. It does this by copying itself until the population is the same size again. However, each time a copy is made, letters have a chance to change. Repeat this process for as long as you like.

The WEASEL algorithm could find strings much faster than the random model, of course. The key is remembering your good choices. The second key is using a population to test multiple possibilities at the same time.

To rebut WEASEL, anti-evolutionists have made two critiques. The first is that WEASEL uses a fixed target string. This, they say, is 'sneaking in information'. No, this is just making the example easy to understand. You can write a WEASEL-like program with a different fitness test that does not involve a fixed string. An example would be the Traveling Salesman Problem (TSP), where the fitness is the shortest path through all the cities.
The second critique is that the program protected its good choices from ever changing again, which was not a good model of mutation in a genome. A careful reading of Dawkins' description shows that he did not make this mistake. It is a side effect of using a population based model that reversions from correct letters rarely happens, but it can happen at high mutation rates in small populations.
Neither of these critiques go to the heart of the issue, that a population based model with cumulative selection is a better approximation to real biology than a random search model.

The reason for this lengthy diversion into WEASEL lore is that anti-evolutionists are not the only folks to get the algorithm wrong. Even academics trying to demonstrate evolution occasionally code it wrong by protecting the correct letters. This brings us to Wilf & Ewens.

Wilf and Ewens do use a model which protects good choices. However, what they are modeling is very different from the WEASEL model. In WEASEL, the objects were individuals in a population. In Wilf & Ewens, the object is a population itself. In a population, the process of fixation is the analog of protecting the letter. This isn't well explained in the paper. Essentially, instead of counting generations, the 'rounds of guessing letters' are rounds of selective sweeps and fixations in the population. Each of these could take many generations. Since the paper is directed at the mistakes of non-specialists, all of this should have been made much clearer.

Now along comes Dr. Lee Spetner. Spetner is a well known and well respected name in the anti-evolution field. He is an MIT Ph.D, so he has credentials that command respect and attention. Spetner has written a critique of Wilf & Ewens, but PNAS has refused to publish it, so it has been posted on Uncommon Descent instead.

Spetner's first critique is that Wilf & Ewens are attacking a mislabeled problem. According to Spetner:

They gave no reference for such a model and, to my knowledge, no responsible person has ever proposed such a model for the evolutionary process to “discredit” Darwin. Such a model had indeed been suggested by many, not for the evolutionary process, but for abiogenesis (e.g., [Hoyle & Wickramasinghe 1981]) where it is indeed appropriate. Their first goal was not achieved.
First, lets thank Dr Spetner for pointing out that evolution and abiogenesis are two different things, two different issues. Many are the anti-evolutionists who cannot make this distinction. However, the bignum argument has been used often against evolution, also. For example, Douglas Axe, Michael Behe, and Stephen Meyer all use it. These people are not 'responsible', apparently. Spetner should inform the Discovery Institute of this!

Since we see that bignum is used against evolution, Spetner's own first critique fails.

Spetner's second critique is more serious. According to Spetner, the selective sweep and fixation of one improvement cannot be achieved until the last sweep is finished. Therefore, fixation of multiple genes must proceed in series, returning us to a bignum argument.

I think a big part of the disagreement here is that Spetner is assuming an asexually reproducing population, while Wilf & Ewens are assuming a sexually reproducing population. Neither the original paper or the critique make this point clear, but that is the simplest reading to me.

You can go on to question Spetner's reasoning, even in the asexual case. First, lateral gene transfer means that in reality, even asexually reproducing bacteria have many 'parents', so genes mix much more quickly than a pure, asexual, mutation driven process on paper. But even in the pure case, assume that a mutation is present in 90% of the population - it has almost taken over, but not quite. Certainly, a new mutation of another gene could arise within that 90% and begin a new selective sweep of its own. How would it 'know' to wait?

Spetner's argument is also couched in terms of the individual, not the population, so it seems that he has missed or ignored the issue of what is the object.

(A much more interesting criticism (to me) assumes that in an asexual population two different good mutations of two different genes arise in different individuals. Now their sweeps are competing with each other. It is a random walk as to which will prevail, and then the losing mutation will have to develop again.)

So Spetner's second critique also fails. He makes a fling that Wilf & Ewens have ignored Fisher, and the chance that even a good mutation can get lost, but this is unfair, since the original paper says:

In practice further modifications are needed to the calculations since, because of stochastic events, only a proportion of selectively favored new mutations become fixed in a population.
Spetner has also posted a version of his correspondence with the PNAS Board, to support a contention that his critique was rejected unfairly. It seems to me that the reasons given don't align with the real issues with the critique.


Pedant said...

"It seems to me that the reasons given don't align with the real issues with the critique."

Of course not. What Spetner got was the usual boiler plate that busy editors resort to when they're looking at crap.

Been there, done that.

Unknown said...

As I understood, Spetner is saying that no responsible person has proposed a bignum model for the selection of gene loci. The ID proponents mentioned have applied it to the sequence for a single gene locus, where a protein is generally quite fault-intolerant, so that all nucleotides tend to be mutually dependent. For multiple loci, where each gene might well add some independent function, certainly each locus might be selected independently. For abiogenesis, however, the model is more appropriate, because a minimum bare-bones suite of genes seems to be necessary to sustain metabolism, or replication. In truth, there are many higher biological systems that depend on entire suites of genes, but these two would have to be present at the dawn of life, and thus represent a barrier that abiogenesis must cross.

Heiliger Fakewooder said...

To continue, the reason that the second sweep cannot proceed until the first is finished is precisely because of the objection that the Author here raises. If two variant mutations arise simultaneously they will just compete against each other. Spetner explains this quite clearly in Not by Chance, if memory serves. He takes the sum total of all superior genetic variants arising in a population (irrespective of the actual gene or mutation from whence they derive their superiority) as the potential new evolutionary 'step' that will compete against the rest of the population.

Certainly, the second mutation could arise once the first has achieved 80% or 90% saturation, though its chances of arising would have to be reduced to reflect the smaller population size. In any event, the sweep of a mutation is highly dependent on the number of mutants. Once the mutation has swept even >50% of the population, the time required to complete the sweep is relatively short, so not much is gained by allowing for further mutations during that time.

David vun Kannon said...

The problem for bignum arguments in abiogenesis is that the size of a 'functional' enzyme, whether protein or RNA, is much much smaller than today's enzymes and the tolerances are a lot looser. Hairpin RNA can have 10-12 nucleotides, not 150. In contrast to 20^150, it is possible to randomly search 4^12 sequences.