Previously I wrote about the underpinnings of genetic algorithms, and briefly mentioned the distinction between genetic algorithms and genetic programming. Here we'll take a deeper dive into genetic programming. Before we begin our jolly stroll through a genetic programming example, I'll acknowledge bias: I believe genetic programming is a fantastic but fictional solution to a complex problem1.
The Dream of Genetic Programming
After a long night of cleansing the internet of false opinions,
you realize that you cannot singlehandedly maintain its sacred purity.
You build a neural network
that accepts a blog entry as input and outputs the correct action
to fix the problem, but it keeps returning Weep in despair, and no
amount of training has gotten the neural net to correctly emit Rant or
Mock on Reddit when fed blog entries containing scintillating topics like
Bitf*ck vs. Scheme, Vista Sucks or Git Owns Perforce.
Head drooped with resignation, you think to yourself, if only I could write a program that would write this program for me! Wait--for this is truly brilliant--if I just feed a list of operations into a genetic algorithm and then...
Two weeks later the newspaper--the one which hasn't yet succumbed to bankruptcy--is heralding your achievement: you, the program you programmed, and the program the program you programmed programmed have cleaned the internet of wrong, and the UN has ordered those nasty and irrelevant bloggers stripped of citizenship and marched into concentration camps.
Ah yes, all is right with the world.
Starting With a Slightly Simpler Problem
Regretfully, we'll need to start out with a simpler
problem than salvaging the internet. Still, the problem we're
going to attempt solving is a bit more complex
than the one we were working on in the previous article (creating an array
of N numbers that equal X when summed).
In our new problem an individual will consist of N elements randomly selected
from this list (one element may be selected multiple times):
options = ('+','-','*','/','0','1','2','3','4','5','6','7','8','9')
Thus, we can create an individual with the following code.
from random import randint
def part():
options = ('+','-','*','/','0','1','2','3','4','5','6','7','8','9')
return options[randint(0, len(options)-1)]
def individual(length=5):
return [ part() for i in range(length) ]
And a population is generated like this:
def population(size=1000, length=5):
return [ individual(length=length) for x in range(size) ]
Now let's describe the problem: we want an individual
that--when its elements are evaluated by Python's eval function--equals
some number X. Not all combinations of the elements
form valid Python syntax (for example with N=5, then 5 5 * + 1 is such an individual),
and those individuals (i.e. those whose elements trigger an exception when passed to eval)
should receive a very poor fitness score. For individuals that do eval successfully,
then their fitness score is their distance from the target value X.
We can formalize that definition into this fitness function:
def fitness(x, target):
try:
val = eval(" ".join(x))
return target - abs(target-val)
except:
return -100000
Beyond these changes, the code is largely identical to our previous code implementing a genetic algorithm. The minor changes relate to our new fitness function where a high value represents strong fitness, while in the previous fitness function a high value presented poor fitness. (You can see the initial code for this problem here.)
Only requiring minimal changes is a testament to the universality of genetic algorithms; we've gone from solving an extremely simple arithmetic problem with a genetic algorithm to solving a substantially more complex genetic programming problem without making any substantial changes to the algorithm! Maybe we should call them universal problem solvers instead of just genetic algorithms.
Let's just verify our optimism real quickly by running a thousand generations.
>>> from gp2 import *
>>> populations, fitness_scores = evolve()
>>> fitness_scores[10:]
[-96200.27, -95499.98, -96000.02, -96200.006, -96300.002, -96599.83, -96599.904, -96000.06, -96500.001, -96800.001]
>>> fitness_scores[-10:]
[-96300.45, -96500.04, -96200.51, -96000.008, -96300.04, -96000.01, -97099.97, -96200.091, -96400.01, -95099.96]
Fuck. Fuck. Fuck. What the fuck happened there?
The average fitness of the first
ten generations is -96270.54, and the average fitness
of generations 990 through 1000 is -96210.11.
That isn't even statistically signifigant.
I called you a universal problem solver, and then you
did this shit to me? Fuck. Let me take a few minutes to
recover.
After a Brief Interim With a Paper Bag
Okay, so I may have over-reacted a bit there, but you've probably already deduced the first take home message: untuned genetic algorithms can perform very poorly (i.e. straight-out fail) on some problems. In particular genetic algorithms perform exceptionally poorly when relying upon an uninformative fitness function, and our current fitness function is fantastically uninformative.
To begin with, very few of the individuals are even in
valid Python syntax, and all invalid individuals receive the same fitness value
of -10000. This means that our current fitness function gives poor
feedback on fitness for the majority of individuals, and thus fails
right out of the gate in a gruesome spectacle.
That's okay though, because we can just improve
on our current fitness function, right?
Let's take another look at our fitness function.
def fitness(x, target):
try:
val = eval(" ".join(x))
return target - abs(target-val)
except:
return -100000
If the syntax is bad, then we get no feedback on how close the syntax is to being valid. If it is valid, then we do start to get useful feedback, but we're a bit stymied as the vast majority of individuals are invalid (we'll look into the ratio between valid and invalid below).
Without being able to evaluate the code we have two ways we can generate some more specific feedback:
- Calculate a fitness function based on how close the code is to syntactically correct,
- Calculate a fitness function based on how close the parameters are to a desired correct solution.
The first option is fairly simple in this extremely trivial example (the format must alternate between number and operation, and must begin and end with a number), but will explode into a rather formidable problem in more complex scenarios.
The second option is entirely bogus, because it means you are 'solving' a problem by teaching it an answer you already know; in that case you're neither optimizing nor solving anything, you're perpetuating a farce.
Like Dusting with a Jackhammer
Genetic algorithms are a category of optimization techniques, but
looking at fitness we can see our problem: first it is attempting
to solve a search problem (finding for working solutions, of whatever quality),
and then trying to optimize (improving the quality of
discovered solutions). Very few--if any--algorithms are going to perform
well for both the search and optimization aspects of this problem.
Infact, in this example we're never even escaping the search aspect of the
problem. We can make that more clear by modifying our grade_population
function to count the number of syntactically valid individuals
(instead of measuring the average fitness of the population).
We'll change it from this:
def grade_population(pop, target):
total = sum([ fitness(x, target) for x in pop ])
avg = total / (len(pop) * 1.0)
return avg
to this (where -100000 is the fitness for an individual that has invalid syntax):
def grade_population(pop, target):
fitnesses = [ fitness(x, target) for x in pop ]
valids = [ x for x in fitnesses if x > -100000 ]
return len(valids)
Now we can rerun the above experiment and with a more concrete view of how we're stuck in the search portion of this problem.
>>> from gp2 import *
>>> history, fitness = evolve(generations=1000)
>>> fitness[:10]
[37, 37, 34, 37, 33, 27, 21, 21, 33, 36]
>>> sum(b[:10]) / 10.0 # avg of first ten
31.6
>>> b[-10:]
[38, 46, 44, 49, 37, 43, 35, 39, 44, 35]
>>> sum(b[-10:]) / 10.0 # avg of last ten
41.0
Over a thousand generations the number of valid individuals did increase, but rather slowly. Out of curiosity, lets see what happens if we try increasing from one thousand to five thousand generations.
>>> history, fitness = evolve(generations=5000)
>>> fitness[:10]
[37, 42, 37, 46, 41, 40, 34, 33, 35, 34]
>>> sum(fitness[:10]) / 10.0 # avg of first ten
37.9
>>> fitness[-10:]
[30, 43, 42, 53, 44, 45, 42, 35, 37, 36]
>>> sum(fitness[-10:]) / 10.0 # avg of last ten
40.7
Yep, increasing from 1000 to 5000 generations didn't
have any noticeable affect: we're definitely stuck
in the search portion of this problem.
Instead of simply increasing the generations, lets
try tweaking the parameters used in the evolve
function. Right now we are only retaining the top thirty
percent of entries (and are thus creating a substantial quantity of
children from crossing over successful parents), let's try
upping that to retaining the top ninety percent of
individuals.
>>> history, fitness = evolve(generations=1000, retain=0.9)
>>> fitness[:10]
[35, 38, 34, 34, 33, 35, 32, 37, 32, 38]
>>> sum(fitness[:10]) / 10.0
34.8
>>> fitness[-10:]
[42, 34, 36, 26, 37, 34, 31, 44, 36, 41]
>>> sum(fitness[-10:]) / 10.0
36.1
We certainly don't see any definition improvement there. What about reducing the percent of retained individuals to the top ten percent and increasing the number of generations to three thousand?
>>> history, fitness = evolve(retain=0.1, generations=3000)
>>> fitness[:10]
[35, 28, 30, 37, 38, 44, 41, 49, 37, 39]
>>> sum(fitness[:10]) / 10.0
37.8
>>> fitness[-10:]
[36, 38, 47, 43, 46, 50, 45, 47, 51, 43]
>>> sum(fitness[-10:]) / 10.0
44.6
Well, that didn't work well either. Perhaps we are going about this the wrong way.
Right now we are creating individuals where N (the number of elements) is 5.
Let's try reducing N to 3. (We'll keep the target number at 15.)
>>> history, fitness = evolve(individ_size=3)
>>> fitness[:10]
[162, 162, 141, 146, 140, 162, 148, 145, 152, 170]
>>> (sum(fitness[:10])*1.0) / 10
152.8
>>> fitness[-10:]
[149, 156, 157, 158, 165, 169, 174, 157, 159, 164]
>>> (sum(fitness[-10:])*1.0) / 10
160.8
With fewer combinations (3^3 instead of 5^5) we're seeing
more syntactically sound individuals, but the percentage
increase over generations is actually worse. Let's make
one last ditch effort. Let's change the target X to
6 instead of 15, retain only the top ten percent of individuals,
and also increase the number of generations to three thousand.
>>> history, fitness = evolve(retain=0.1, individ_size=3, target=6, generations=3000)
>>> fitness[:10]
[156, 147, 137, 135, 129, 153, 152, 141, 146, 149]
>>> sum(fitness[:10]) / 10.0
144.5
>>> fitness[-10:]
[160, 166, 157, 152, 163, 170, 177, 167, 153, 161]
>>> sum(fitness[-10:]) / 10.0
162.6
For an algorithm whose primary weakness is quick and myopic optimization, this lack of progress is a strong indication of what we've already grown to suspect: the genetic algorithm simply cannot overcome the search aspect of this problem.
This one example is far from sufficient justification, but I'll go a bit further and suggest that genetic programming will only solve problems when you supply a detailed outline of the solution. Genetic algorithms can optimize a solution from supplied components, but genetic programming is inadequate to both discover and optimize a solution. This is a shame, as there is a level of elegance to the concept that is hard to dismiss.
It isn't hard to see how this idea could repeatedly lure scientists to bang their heads on the problem until they contorted their cortexes and algorithms enough to believe it worked. If only the results of genetic programming justified its conceptual beauty.
(You can see revised code here.)
A Brief Response on 2009/1/20
Many commenters felt strongly that this introduction unfairly maligns genetic programming. Based on the deep anger they have expressed, it seems likely that these are individuals who are doing research in the field of genetic programming. It is always those too close to a topic, and often academics, who get irrationally upset about criticism. Certainly I'd be equally upset if someone was suggesting I had spent half a decade chasing a pipe dream.
Beyond the anger, the fundamental complaint is that I am generating the individuals using a trivial approach of randomly combining elements, rather than creating the rather complex infrastructure needed to consistently generate syntactically correct abstract syntax trees.
In the above article, I mentioned two responses to an uninformative fitness function:
- Teach the fitness function how to code.
- Teach the fitness function the code to solve the problem.
Although listed separately, and above the screaming voices, I don't believe these two solutions
are genuinely distinct. Even if we use couch our resorting to #2 behind the premise of
performing #1, we are still teaching the fitness function to create the solution we desire.
If we accept that genetic algorithms are incapable of solving problems in the domain of search,
how could it be otherwise? Even restricted to valid code, the problem space is extraordinarily large; we either
require a true search algorithm for discovering solutions, or we provide hints on the direction
in which to search.
Two final points:
The aim of these articles is introducing novice programmers to the ideas of computer science, while sticking with the Python language, and most importantly supplying full working examples that don't rely on external (i.e. non-standard) libraries. Using abstract syntax trees in the solution wasn't in the cards, and neither was dismissing genetic programming as unattainably complex or restricted to the realm of celestial researchers.
There is mention of some unspecified problem area where genetic programming is superior to genetic algorithms specifically, and perhaps other algorithms in general. Although the definition of superior is unstated, as is the problem domain, I'd be glad to have a contest along these lines:
- A problem is chosen that is believed to be firmly within the problem domain where genetic programming is most competitive.
- A single common programming language is decided upon (perhaps Scheme or Common Lisp?).
- Code is written without resort to algorithm specific support libraries (i.e. a library for generating random numbers would be acceptable, but a library which supports genetic programming/genetic algorithms is not). This seems important as it reveals the true complexity of the algorithm, as opposed to concealing the complexity behind a library.
- Any number of contestants submit solutions, along with the category of algorithm it uses.
- The solutions and tests cases are distributed in a zipped file so we may all verify the results.
- Anyone is free to create their own grading scale, and to write up the group results with their own bias.
Initial Full Code
from random import randint, random
def part():
options = ('+','-','*','/')
options += tuple( "%d" % x for x in range(0,10))
return options[randint(0, len(options)-1)]
def individual(length=5):
return [ part() for i in range(length) ]
def population(size=1000, length=5):
return [ individual(length=length) for x in range(size) ]
def fitness(x, target):
try:
val = eval(" ".join(x))
return target - abs(target-val)
except:
return -100000
def grade_population(pop, target):
total = sum([ fitness(x, target) for x in pop ])
avg = total / (len(pop) * 1.0)
return avg
def generation(pop, target, retain=0.3, random_select=0.05, mutate=0.01):
graded = [ (fitness(x, target), x) for x in pop ]
graded = [ x[1] for x in sorted(graded, reverse=True) ]
retain_length = int(len(graded)*retain)
parents = graded[:retain_length]
# randomly add other individuals to
# promote genetic diversity
for individual in graded[retain_length:]:
if random() > random_select:
parents.append(individual)
# mutate some individuals
for individual in parents:
if random() > mutate:
pos_to_mutate = randint(0, len(individual)-1)
# this mutation is not ideal, because it
# restricts the range of possible values,
# but the function is unaware of the min/max
# values used to create the individuals,
individual[pos_to_mutate] = part()
# crossover parents to create children
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = randint(0, parents_length-1)
female = randint(0, parents_length-1)
if male != female:
male = parents[male]
female = parents[female]
half = len(male) / 2
child = male[:half] + female[half:]
children.append(child)
parents.extend(children)
return parents
def evolve(pop_size=1000, target=15, individ_size=5, retain=0.3,
generations=1000, random_select=0.05, mutate=0.01):
p = population(size=pop_size, length=individ_size)
history = [ p, ]
fit_history = [ grade_population(p, target=target) ]
for i in range(generations):
p = generation(p, target, retain, random_select, mutate)
history.append(p)
fit_history.append(grade_population(p, target=target))
return history, fit_history
Refined Full Code
from random import randint, random
def part():
options = ('+','-','*','/')
options += tuple( "%d" % x for x in range(0,10))
return options[randint(0, len(options)-1)]
def individual(length=5):
return [ part() for i in range(length) ]
def population(size=1000, length=5):
return [ individual(length=length) for x in range(size) ]
def fitness(x, target):
try:
val = eval(" ".join(x))
return target - abs(target-val)
except:
return -100000
def grade_population(pop, target):
pop_fitness = [ fitness(x, target) for x in pop ]
valid_syntax = [ x for x in pop_fitness if x > -100000 ]
valid_count = len(valid_syntax)
return valid_count
def generation(pop, target, retain=0.3, random_select=0.05, mutate=0.01):
graded = [ (fitness(x, target), x) for x in pop ]
graded = [ x[1] for x in sorted(graded, reverse=True) ]
retain_length = int(len(graded)*retain)
parents = graded[:retain_length]
# randomly add other individuals to
# promote genetic diversity
for individual in graded[retain_length:]:
if random() > random_select:
parents.append(individual)
# mutate some individuals
for individual in parents:
if random() > mutate:
pos_to_mutate = randint(0, len(individual)-1)
# this mutation is not ideal, because it
# restricts the range of possible values,
# but the function is unaware of the min/max
# values used to create the individuals,
individual[pos_to_mutate] = part()
# crossover parents to create children
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = randint(0, parents_length-1)
female = randint(0, parents_length-1)
if male != female:
male = parents[male]
female = parents[female]
half = len(male) / 2
child = male[:half] + female[half:]
children.append(child)
parents.extend(children)
return parents
def evolve(pop_size=1000, target=15, individ_size=5, retain=0.3,
generations=1000, random_select=0.05, mutate=0.01):
p = population(size=pop_size, length=individ_size)
history = [ p, ]
fit_history = [ grade_population(p, target=target) ]
for i in range(generations):
p = generation(p, target, retain, random_select, mutate)
history.append(p)
fit_history.append(grade_population(p, target=target))
return history, fit_history
If these computer science entries are ever collected into a book, the title will contain the word biased in bold Marker Felt.↩
You listed options 1 and 2, then dismissed them both. Option 2 is clearly bogus, but option 1 is the only solution. Genetic algorithms simply don't do blind random search, which is what you're asking it to do here. You must have a fitness function that can grade "more fit" from "less fit" throughout the genetic space; in other words, a fitness curve that has slope everywhere, not a fitness function (like your current one) that is mostly flat except at one end.
You note that grading "closeness to syntactic correctness" gets complex quickly for complex problems of this type. Sure it does. That's a more illuminating and precise conclusion to the article; not that genetic programming is inadequate to "discover a solution" (whatever that means), rather that the precondition for a genetic solution is always your ability to write a fitness function that doesn't have large flat areas.
Carl,
Sorry for dismissing #1 without enough discussion, and your defense of #1 is certainly more elegant than my rejection. My opinion is that as we continue to devote additional effort and code to eliminating those "large flat areas", we are in-fact inching further and further from genetic programming and towards a standard usage of genetic algorithms.
At some point we have given the fitness function enough hints that we are giving the algorithm the components and instructions on how they can be assembled, and just asking it to optimize that combination of those components while following our guidelines. My problem with this is this: what, then, is the distinction between GA and GP?
Will
The distinction between GA and GP is that with GP, you are working with a syntax tree. Not a flat series of tokens, which is obviously stupid and not even worth talking about, but a syntax tree that is (syntactically) correct by construction.
What you have described isn't GP at all. It's not even related to GP.
You generate a random syntactically correct tree for your initial, then you can do crossover and mutation on that tree based on tree algorithms. You run into some big problems right away, it's no panacea, and ultimately the title of the post may even still be correct. But this isn't even close to why. This is a total non-problem.
Obviously, I am not doing a rich field justice in two paragraphs.
There is another way to think of this. You defined the search space (in individual, part, and fitness). You note later that much of the search space is invalid. Well that's not a problem with genetic programming, it's a problem with your search space.
You should try to construct the search space in a way that most individuals that can be constructed are also valid.
For instance we could change
to
It's true that "5 5 * + 1" is invalid but "55*+1" stands a much better chance.
Interesting article. I would like to point out though that the problem of generating syntactically correct programs can be solved with relative ease in Lisp languages. Yes, all the parentheses are daunting at first, but the structure of a lisp program as a list of lists basically lays bare the abstract syntax tree.
In this situation, all you need to generate syntactically valid programs is a more precise description of each option. Say, for example, '4' is a terminal symbol; '+' is a symbol that requires 2+ operand symbols, which can be any symbol, terminal or not. Then, to generate a program, you pick an option at random, and recursively fill all empty slots of that option with other random choices.
This of course introduces the theoretical problem of generating infinite programs, but with a little extra limiting you can bound the depth of programs.
You then need adapted crossover/mutation operators, but they basically amount to transplanting subtrees around and mutating tree elements (with possible regeneration of subtrees).
This approach would drastically reduce the search space, since the generator suddenly can generate only valid code, which is much harder in Python or other languages where the AST is not as apparent.
Now, whether that would suffice to generate only interesting syntactically valid programs (ie. programs that evaluate to a number), and what magic factors need to be applied to fitness evaluation, crossover and mutation to create selective pressure in the right direction, is still a problem.
(PS: I'm not specifically a lisp fan, but I've found through experimentation on similar problems that lisps are quite stellar on problems that involve code generation)
Dave,
I agree that working with a Lisp-like makes forming syntactically correct programs much simpler, and also provides a much more powerful evaluation function than that in Python. The constraint here is that I am attempting to create an introduction while sticking with Python.
There is presumably some way to grab hold of the Python AST for a given set of code, which would have been a much more appropriate vector of approach, but perhaps beyond the scope of what I felt I could accomplish in what I'm attempting to accomplish, which is introductions to computer science topics for a broader audience.
What you are looking for is the compiler module, which lets you manipulate python ASTs from within python. If you are actually trying to accomplish the goal you set out (i.e. introduce computer science topics to a broader audience) in this case you might want to start with explaining why GP techniques are almost always applied to languages in which the boundary between code and data is thin and why this is necessary for GP but not for GAs (e.g. the GA "genome" is interpreted by something else while the GP syntax trees are usually handled by the same interpreter that evolves the code/data.)
You might like taking a look at grammatical evolution. It is an attempt to bridge the gap between genetic algorithms and genetic programming. Using it, you define a BNF that is used to generate syntactically correct statements that are driven by a genetic algorithm.
This is really spectacularly ignorant... reminds me of some of the spectacularly ignorant articles I read years ago bashing the idea of garbage collection. Epic fail.
I've seen a lot of spectacularly ignorant anti-evolutionary-computation articles in recent years. Based on the history of managed languages and garbage collection, I'll make a contrarian prediction that GP is about to become ubiquitous. :)
Garbage collection is an innovative and fundamentally distinct approach to memory management. Genetic programming is applying a layer of paint to genetic algorithms and justifying another year of trade journal subscriptions and PhD thesis.
Yup, that sounds like the response predicted by Adam's hypothesis.
Genetic programming is about discovering an algorithm or formula that satisfies a fitness function. Will's not doing that. As he notes, he's never really getting down to the layer of actually evaluating an evolved algorithm, due to the noise he's throwing at the fitness function. By not using a syntax tree to generate valid syntax, he's not really evaluating genetic programmming at all.
This looks to me like a real good example of why you need to use a syntax tree when generating GP DNA.
This example isn't genetic anything, really, 'cause there's not a lot of evolution going on, as Will notes. The closest equivalent I can think of to what he's doing is actually 'crashme'.
Reply to this entry