| Q&A about the GEP Book
 These are Q&A from my personal 
		correspondence with readers of my
		
		book, both of the first and second editions. It’s been great getting 
		your feedback and your emails! Thank you all so much! I know the second 
		edition of my book is a little bit pricey (but if it’s any consolation, 
		it’s a huge, not to mention substantial :), book with 478 pages), but 
		for those of you who cannot afford it, I made the first edition freely 
		available
		
		online in html. So please take a look and join in on the 
		discussions! As usual, for privacy sake, I removed any references that 
		could identify the people involved. I am impressed how much comes out of the treatment of 
		constants. It seems you have taken them much farther than in GP. I can't 
		remember, but I don't think GP has any analogue of the neat optimization 
		techniques. After reading that and the decision tree chapter, I can 
		really see how important the "Dc" part of the chromosome can be!
 
 I’m really glad you understood the importance of the Dc in GEP (most 
		people choose to very conveniently ignore it). To my knowledge GP has no 
		analogue of the optimization techniques I propose but there are people 
		that nevertheless claim that GP can be used to do parameter optimization 
		by running a typical symbolic regression run – a perfect example of a 
		Rube Goldberg machine, in my opinion.
 
 
		In the chapter on numerical constants, I wondered why you 
		have both the R and C vectors. In other words, why not just put the C 
		values (or a subset of them) in the Dc part? Why the extra step of 
		referencing C using R? I can see that some extra variability may result, 
		but since you mutate C as well as R, the search space is larger. I'm 
		sure there is something I missed here.
 I don’t remember having weighted the pros and cons of this 
		representation. I first tried to implement a system similar to the one 
		used in GP but I wasn’t satisfied with the results and had a feeling 
		that something better could be created. I then came up with this system 
		and now that you ask, I think the use of both R and C has indeed some 
		advantages. First, it gives more control over constant variability (the 
		Dc length depends both on the head size and max arity and therefore in a 
		direct representation the variability will be dictated by the Dc size). 
		Second, it allows the appearance of duplicated constants (it’s much 
		harder to evolve exactly the same floating-point constants with the 
		direct representation) which seems to be very important as it is common 
		to see the same constant being used in different places. And finally, I 
		think it is more elegant.
 
 
		I wasn't sure to what degree your criticism of 
		homogenizing was directed properly at GP, though. GP does not use 
		mutation usually, but Koza points out that the exchange of subtrees 
		results in constant innovation in the population. For instance, even if 
		all individuals are identical, a GP cross can still produce an 
		individual that is different from the rest. What do you think?
 That’s true, but there is however some homogenization going on there 
		otherwise they wouldn’t say that in GP it is useless to continue 
		evolving beyond 51 generations. I’ve never implemented a Koza-style GP 
		so I cannot tell you exactly what goes on within all those huge numbers 
		of individuals that he usually uses, but one can have an idea of what 
		takes place there by running a simple experiment in GEP: by using just 
		gene recombination and gene transposition both at 100% and a chromosome 
		with say 10 genes, one is just recombining mathematically concise blocks 
		as is done in GP and since genes are moving around in the chromosome 
		you’ll also avoid chromosomes becoming equal. And indeed populations 
		also stop evolving after a few generations. So perhaps what happens in 
		GP is that innovation stops because it is rediscovering again and again 
		old combinations of building blocks.
 
 
		I keep coming back to how in GEP the average fitness is 
		way below the best individual fitness, and your use of relatively high 
		mutation rates. In GA, one says that you can't mutate too fast or as you 
		approach the optimum, you'll "mutate back" (disadvantageously) more 
		often than forward, and won't reach the optimum. Is it because you use 
		elitism that you can mutate so fast, and also so disruptively?
 In GEP the behavior of average fitness will depend, among other things, 
		on the genetic operators you are using and, yes, if you use all the 
		genetic operators at the rates I usually use, you will get populations 
		with best fitness considerably higher than average. But if you use just 
		recombination (and even add a little bit of mutation), you’ll get what 
		one usually does for a GA. Obviously elitism plays a small part in all 
		this, and of course the larger the percentage of individuals that are 
		preserved from generation to generation the closer the gap between 
		average and best fitness. And since in GEP I only clone one individual 
		(the very best), this might also contribute to accentuate the gap even 
		more. (I know Koza usually keeps a fair amount of individuals unchanged 
		from generation to generation as I guess he cannot afford to lose 
		genetic diversity, but I remember doing some early experiments and 
		concluding that in GEP it is a waste of resources to keep so many 
		unchanged individuals as the system works equally well by keeping just 
		one of the best.)
 
 About “mutating back” in GAs I have my doubts as people seem to be so 
		fixated on crossover. From my experience with GAs, I don’t see that they 
		behave so much differently than GEP. In all systems, there is a point 
		where you start applying too much mutation and you are just delaying 
		things or making things almost random, which obviously is not the goal. 
		And the GA is no exception. One can call it “mutate back” but I guess 
		this is just common sense.
 
 
		I guess what surprises me is that the average fitness, 
		after an initial rise, stays pretty much constant, whereas the best 
		fitness keeps rising. Why doesn't average fitness follow best fitness 
		up? I guess the answer is that there is so much mutation (of various 
		kinds). But then, why does the best keep rising? I guess that's because 
		it can't fall! (due to elitism) and is occasionally bettered. Well, I'm 
		not very expert on this.
 You got that right! These systems are usually optimized to solve 
		problems quickly and efficiently and, thanks to elitism, one can use 
		mutation rates considerably higher than the mutation rates used in 
		nature as what counts in nature (and in some artificial evolutionary 
		systems) is the performance of the overall population. Here, one is just 
		interested in the performance of the best and the rest of the population 
		is always being sacrificed for the benefit of the best. Obviously, 
		without elitism you would have to be more careful and if you were to 
		design good solutions you no doubt would have to be more conservative 
		and use much smaller mutation rates (as is done in nature). In this 
		case, the best and average fitnesses would be very close and both would 
		slowly and painstakingly increase with time (as happens in nature).
 
 
		I found the last chapter “Evolutionary Studies” 
		stimulating, and I'm not sure I've drawn all the implications yet. I 
		guess the analysis in Goldberg's recent book (The Design of Innovation ) is based mainly on what he rather pompously calls 
		"selectorecombinative" GAs – i.e., ones with crossover only. And he 
		derives a lot of stuff having to do with the sizing of populations in 
		order to assure a solution is found. From your analysis, this would seem 
		to be simply that he wants the populations big enough so that there will 
		be a significant founder effect, in order that the solution is found 
		before the homogenizing process of crossover-only takes over. Is this 
		how you see that? 
 I think you are correct. If he only uses recombination he will need big 
		populations both to find good solutions and avoid premature convergence.
 
 
		The combinatorial optimization results are startling 
		indeed. Your point about the abandonment of inversion in GAs is 
		interesting.
 Why on earth do they continue fixated on recombination? Is it the allure 
		of the schema theorem and the building block hypothesis? Or is this a 
		case of trying to make the facts fit the theory?
 
 
		I found the section on neutrality very nice, as it 
		counters any possible objection to "wasted" elements in the chromosome.
 I hope you are right because I’ve received the most astounding comments 
		from supposed specialists in the EC field who think they know everything 
		there is to know about non-coding sequences in the genome. This jewel is 
		from a reviewer to a paper I submitted to a conference:
 
			
				|  | “Also, it's difficult to consider the sequences at the end of 
				the GEP genes to be ‘junk’. In the biological sense (unless I'm 
				mistaken) the large amount of junk in gene sequences is really 
				that – either they have shown it doesn't code for any proteins, 
				or they have not managed to assign function to it yet. Whereas, 
				you say yourself that your junk sequences in GEP genes are 
				‘extremely important’." |  |  The beautiful thing about the non-coding regions of 
		GEP genes is that we know exactly why they exist, that is, they exist to 
		ensure syntactic closure. That they may provide other benefits – such as 
		provide places for safely exchanging genes (more precisely, ORFs or 
		K-expressions) or places for the accumulation of neutral mutations – is 
		just an extra bonus. Indeed, like so many times in evolutionary systems, 
		natural or otherwise, a structure first appears to solve a particular 
		problem but because it is also handy at solving others it brings 
		unexpected benefits, making the system much more robust as a whole.
 
		The only chapter I found a bit confusing was Chapter 7. I 
		didn't know anything about the Kolmogorov-Gabor polynomials or 
		STROGANOFF. I couldn't understand where you were going – I felt left 
		out. Then I saw at the end how you solved those problems much more 
		simply with GEP, and I realized that was the point of the chapter.
 This is also the chapter I like less and I intend to rewrite it 
		completely the next time around because this algorithm is very 
		interesting especially if one uses simpler polynomials requiring just 
		2-3 different coefficients (the complete K-G series requires 6, making 
		things excessively complex). And since there is no limit to the variety 
		of functions one can implement, it is possible to create a varied enough 
		set and just play with them as one does with any other mathematical 
		function (you can even mix them all together and it works even better). 
		I don’t think I stressed this enough in my book but anyone could reach 
		this conclusion just by analyzing the kinds of functions that were being 
		selected to build the best models (I actually found this out while 
		rewriting this chapter for the 2nd edition but at the time I was 
		unfortunately short on time to once again rewrite it).
 
 Why do you weight all the 
		functions in the function set twice in almost all classification 
		problems? And what is the criterion for choosing the function set?
 Well, what I found to work better for a 
		classification problem such as this (diagnosis 
		of breast cancer with 9 variables) is to have a function set with at 
		least twice as many functions as there are variables. So, in this case, 
		weighting the 10 functions two times does more or less this. Usually if 
		overall we have less functions than terminals, we will be making the 
		evolutionary process less efficient as the evolved models will have a 
		tendency to become very simplistic since the terminals will have a 
		greater probability of being integrated not only during the creation of 
		the initial population but also during genetic modification. So, it is 
		important to find a balance between the number of functions and 
		variables. As for the functions chosen, my personal experience is that 
		most of the time a simple function set with only the basic arithmetic 
		operators (weighted a certain number of times) is a good starting point 
		for classification problems. But you know how people are (myself 
		included): we like to use more fancy functions and since
		GeneXproTools deals with more complex functions equally 
		well, I just like to spice the evolutionary process a bit and see what 
		happens.
 
 Which values should be 
		used for the 0/1 rounding threshold? Will the current model not adjust 
		itself to any value you have chosen?
 You are right: the plasticity of GEP is 
		extraordinary and it can, indeed, find a way around this (you see the 
		same plasticity operating with different function sets, linking 
		functions, random numerical constants, program structures and so on). 
		But for most problems there is a certain 0/1 rounding threshold that 
		works best. It will depend mainly on three factors: (1) on the kind of 
		data you have (only positive values, only negative values, a mixture of 
		the two, normalized or non-normalized data, etc.); (2) on the kind of 
		functions you are using (for instance, if you are using functions that 
		by themselves already round the values they return to 0 or 1, in this 
		case a rounding threshold between 0 and 1 will do); and (3) on the 
		presence of random numerical constants (when a diverse range of random 
		numerical constants is available it is always possible for the algorithm 
		to work its way around even the most adverse situations, such as a 
		negative rounding threshold combined with only positive inputs and a 
		function set with only addition and multiplication).
 
 Concerning the
		
		diagnosis of breast cancer example, you write that you got a 340 
		fitness value (number of hits) against the training set of 350 records 
		and a 173 fitness value against a 174 records testing set. Did you wait 
		until the model has provided the best training fitness to apply it 
		against the test set and get this optimum result, or did you evolve many 
		times, through a "stop and go again" process until the test set fitness 
		has reached its best?
 As you know, 
		GeneXproTools allows you to stop the evolutionary process after 
		which you can take a look at the performance in the test set. And if you 
		are not satisfied with the generalization results you might choose to 
		continue by using the evolved model as seed. Well, I do this a lot 
		whenever I’m trying to find the best possible model to solve a 
		real-world problem but, in the
		
		breast cancer example you mentioned, I did several runs from scratch 
		using different settings (different functions, different rounding 
		thresholds, different architectures and so on) as I went along and then 
		I chose the best model.
 
 Questions & 
		Answers *** |