GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

 

C. FERREIRA Advances in Complex Systems, Vol. 5, No.4, 389-408, 2002

Genetic Representation and Genetic Neutrality in Gene Expression Programming

Conclusions
 
The neutral theory of evolution was tested using the artificial genotype/phenotype system of gene expression programming. GEP provides an ideal framework to conduct such an analysis for three main reasons. First, GEP is a simple artificial life system with a truly functional genotype/phenotype mapping and therefore can provide valuable insights into the workings of any genotype/phenotype system. Second, the number and extent of neutral regions in the genome can be tightly controlled either by increasing the number of genes or the gene length. And third, the high efficiency of the algorithm allows not only the execution of thousands of runs in minutes but also the undertaking of non-trivial tasks with which to make the analysis. Indeed, previous discussions on the importance of neutrality in genetic programming are inconclusive, with some researchers claiming an important role for introns (as neutral motifs are called in GP) and others claiming that introns are an hindrance and must be avoided [1, 2, 15, 17, 20, 21, 22]. How can these contradictory results be explained? Either the conclusions were made using a non-representative number of runs on a trivial task or the simple replicator system of GP is governed by other, still unknown rules. At least for RNA molecules, it has been shown that neutral mutations play an important role in evolution, allowing the diffusion of populations along neutral genotypic networks [10, 16, 19].

In this work, a total of four experiments involving thousands of runs each were made. These experiments show that extremely compact systems with little or no room for neutral regions are significantly less efficient than moderately redundant systems where a considerable part of the genome is engaged in doing “nothing” either by being part of non-coding regions at the end of the ORFs or by encoding neutral motifs that contribute nothing to the individual program. Furthermore, it was also shown that highly redundant systems are, nonetheless, more efficient than extremely compact ones, suggesting that evolutionary systems can cope very well with excessive redundancy. And finally, it was also shown that multigenic systems are more efficient than unigenic ones, suggesting that the fragmentation of the genetic information into smaller units such as genes allows the evolution of more complex programs composed of smaller sub-programs.

Because of their clarity, the results presented in this work are extremely useful for understanding the role of genetic neutrality both in artificial and natural evolution. As shown here, there are two different kinds of neutral regions in GEP: the neutral motifs within the ORFs and the non-coding regions at the end of the ORFs. In simple replicator systems such as GP or GAs, only the former exists whereas in genotype/phenotype systems such as the DNA/protein system or GEP, both kinds exist. And the presence of non-coding regions in genotype/phenotype systems is certainly entangled with the higher efficiency of these systems. For instance, introns in DNA are believed to be excellent targets for crossover, allowing the recombination of different building blocks without their disruption (e.g., [13]). The non-coding regions of GEP can also be used for this purpose and, indeed, whenever the crossover points are chosen within these regions, entire ORFs are exchanged. Furthermore, the non-coding regions of GEP genes are ideal places for the accumulation of neutral mutations that can be later activated and integrated into coding regions. This is an excellent source of genetic variation and certainly contributes to the increase in performance observed in redundant systems.

But, at least in GEP, the non-coding regions play another, much more fundamental role: they allow the modification of the genome by numerous high performing genetic operators. And here by “high performing” I mean genetic operators that always produce valid structures. This problem of valid structures applies only to artificial evolutionary systems for in nature there is no such thing as an invalid protein. How and why the DNA/protein system got this way is not known, but certainly there were selection pressures to get rid of imperfect genotype/phenotype mappings. The fact that the non-coding regions of GEP allow the creation of a perfect genotype/phenotype mapping, further reinforces the importance of neutrality in evolution, as a good mapping is essential for the crossing of the phenotype threshold in evolution.

On the other hand, the reason why neutral motifs within structures, be they parse trees or proteins, can boost evolution, is not so easy to understand, although I think this is another manifestation of the same phenomenon of recombining and testing smaller building blocks. In this case, the building blocks are not entire genes with clear boundaries, but smaller domains within genes. Indeed, in nature, most proteins have numerous variants in which different amino acid substitutions occurred. These amino acid substitutions occur mostly outside the crucial domains of proteins such as the active sites of enzymes and, therefore, the protein variants work equally well or show slight differences in functionality. At the molecular level, these variants constitute the real genetic diversity, that is, the raw material of evolution. The neutral motifs of GEP or GP play exactly the same function, allowing the recombination and testing of different building blocks and, at the same time, allowing the creation of neutral variants that can ultimately diverge and give rise to better adapted structures.

Home | Contents | Previous | Next