GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

 

C. FERREIRA In R. Roy, M. Köppen, S. Ovaska, T. Furuhashi, and F. Hoffmann, eds., Soft Computing and Industry: Recent Applications, pages 635-654, Springer-Verlag, 2002.

Gene Expression Programming in Problem Solving

Open Reading Frames and Genes
 

The structural organization of GEP genes is better understood in terms of open reading frames (ORFs). In biology, an ORF or coding sequence of a gene begins with the ‘start’ codon, continues with the amino acid codons, and ends at a termination codon. However, a gene is more than the respective ORF, with sequences upstream of the start codon and sequences downstream of the stop codon. Although in GEP the start site is always the first position of a gene, the termination point does not always coincide with the last position of a gene. It is common for GEP genes to have non-coding regions downstream of the termination point. For now we won’t consider these non-coding regions, because they don’t interfere with the product of expression.

Consider, for example, the algebraic expression:

(2.1)

It can also be represented as a diagram:

where ‘Q’ represents the square root function.

This kind of diagram representation is in fact the phenotype of GEP chromosomes, being the genotype easily inferred from the phenotype as follows:

0123456789

+/Q*c-abde

(2.2)

which is the straightforward reading of the ET from left to right and from top to bottom (exactly as we read a page of text). The expression (2.2) is an ORF, starting at ‘+’ (position 0) and terminating at ‘e’ (position 9). I named these ORFs K-expressions (from Karva notation).

Consider another ORF, the following K-expression:

012345678901

*-/Qb+b+aaab

(2.3)

Its expression as an ET is also very simple and straightforward. To correctly express the ORF, we must follow the rules governing the spatial distribution of functions and terminals. The start position (position 0) in the ORF corresponds to the root of the ET. Then, below each function are attached as many branches as there are arguments to that function. The assemblage is complete when a baseline composed only of terminals (the variables or constants used in a problem) is formed. So, for the K-expression (2.3) above, the following ET is formed:

Looking at the structure of GEP ORFs only, it is difficult or even impossible to see the advantages of such a representation, except perhaps for its simplicity and elegance. However, when ORFs are analyzed in the context of a gene, the advantages of this representation become obvious. As I said, GEP chromosomes have fixed length, and they are composed of one or more genes of equal length. Therefore the length of a gene is also fixed. Thus, in GEP, what varies is not the length of genes which is constant, but the length of the ORFs. Indeed, the length of an ORF may be equal to or less than the length of the gene. In the first case, the termination point coincides with the end of the gene, and in the last case, the termination point is somewhere upstream of the end of the gene.

So, what is the role of these non-coding regions in GEP genes? They are in fact the essence of GEP and evolvability, for they allow the modification of the genome using any genetic operator without restrictions, producing always syntactically correct programs without the need for a complicated editing process or highly constrained ways of implementing genetic operators. Indeed, this is the paramount difference between GEP and previous GP implementations, with or without linear genomes.

Let’s analyze then the structural organization of GEP genes in order to understand how they invariably code for syntactically correct programs and why they allow the unconstrained application of any genetic operator.

Home | Contents | Previous | Next