In this section:
We already know that the automatic evolution of computer programs can only be done smoothly and efficiently if a genuine genotype/phenotype mapping is available. The creation of such a mapping requires some creative thinking because proteins and computer programs are very different things. Thankfully, computer programs are much easier to understand than proteins and it is not necessary to know, for instance, the rules that determine the three-dimensional structure of proteins in order to create a simple genotype/phenotype system capable of evolving computer programs. What are, then, the fundamental properties common to the DNA/protein system and an artificial system especially designed to evolve computer programs? Obviously, the first is the creation of the genome/program dyad; and second, no matter what, the genome must always produce valid programs. And how can that be accomplished?
Turning to nature for inspiration can help. How does the DNA/protein system cope with complexity? Is the information somehow fragmented in the genome? Then perhaps the fragmentation of the genome in genes could also be useful in a simple artificial evolutionary system. And what about expression in nature? Is all the information encoded in the genome always expressed? How is it possible to differentiate the information that gets to be expressed from the silenced one? Why is differentiation important? Might this also be of any use in artificial evolutionary systems? Although the answers to all these questions are still being sought, what is known is that, in nature, genomes are vastly redundant, with lots and lots of so called junk DNA which is never expressed: highly repetitive sequences, introns, pseudogenes, and so forth. So, most probably, the introduction of junk sequences in an artificial genome can also be useful.
The genetic representation used in gene expression programming explores both the fragmentation of the genome in genes and the existence of junk sequences or non-coding regions in the genome. As Kimura hypothesized
(Kimura 1983), the accumulation of neutral mutations plays an important role in evolution. And the non-coding regions of GEP are ideal places for the accumulation of neutral mutations. In this section, we will analyze the importance of neutral regions in the genome and, consequently, the importance of neutral mutations in evolution using the fully functional genotype/phenotype system of gene expression programming.
For this analysis, two simple, exactly solved test problems were chosen. These problems can be solved using both unigenic and multigenic systems. On the one hand, the extent of non-coding regions in unigenic systems can be easily increased by increasing the gene length. On the other, in multigenic systems the number of non-coding regions can be increased by increasing the number of genes.
The first problem chosen for this analysis is a function finding problem where the test function
(4.1) of section 4.1.1 was used. And the second is a more difficult sequence induction problem where the test sequence
(4.9) was used (this sequence was also used in sections
7.2 and 7.3).
For the function finding problem, a set of 10 random fitness cases chosen from the interval [-10, 10] was used; the fitness function was evaluated by equation
(3.1b) and a selection range of 25% and a precision error of 0.01% were chosen, giving maximum fitness
fmax = 250; and population sizes P of 30 individuals and evolutionary times
G of 50 generations were chosen.
For the sequence induction problem, as usual, the first 10 positive integers
n and their corresponding an term were used as fitness cases; the fitness function was also evaluated by equation
(3.1b) and a selection range of 25% and maximum precision (0% error) were chosen, giving
fmax = 250; the population size and the evolutionary time were increased, respectively, to 50 and 100 as this problem is slightly more difficult than the function finding one.
In all the experiments, the function set F = {+, -, *, /} and the terminal set
T consisted only of the independent variable which was represented by
a, giving T = {a}; genetic modification was introduced using a mutation rate of 0.03, an IS and RIS transposition rates of 0.1 with a set of three transposons of lengths 1, 2, and 3, and two-point and one-point recombination rates of 0.3; in multigenic systems gene recombination and gene transposition were also used as sources of genetic modification, both at rates of 0.1 and the linking was made by addition; as usual, the selection was made by roulette-wheel sampling coupled with simple elitism and the success rate was evaluated over 100 independent runs.
|