Structural Organization of Genes

  GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

C. FERREIRA

In Leandro N. de Castro and Fernando J. Von Zuben, eds., Recent Developments in Biologically Inspired Computing, pages 82-103, Idea Group Publishing, 2004.

Gene Expression Programming and the Evolution of Computer Programs

The genes of gene expression programming are composed of a head and a tail. The head contains symbols that represent both functions and terminals, whereas the tail contains only terminals. For each problem, the length of the head h is chosen, whereas the length of the tail t is a function of h and the number of arguments n of the function with more arguments (also called maximum arity) and is evaluated by the equation:

t = h (n-1) + 1

(4)

Consider a gene for which the set of functions F = {Q, *, /, -, +} and the set of terminals T = {a, b}. In this case n = 2; if we chose an h = 11, then t = 11 (2 - 1) + 1 = 12; thus, the length of the gene g is 11 + 12 = 23. One such gene is shown below (the tail is shown in blue):

01234567890123456789012
+-ba+Qbba/abaaaabbaabb*	(5)

It codes for the following expression tree:

or the equivalent mathematical expression . In this case, the open reading frame ends at position 9, whereas the gene ends at position 22.

Suppose now a mutation occurred at position 3, changing the “a” into “*”. Then the following gene is obtained:

01234567890123456789012
+-b+Qbba/abaaaabbaabb	(6)

And its expression gives:

which mathematically corresponds to . In this case, the termination point shifts four positions to the right (position 13), enlarging and changing significantly the daughter tree.

Obviously the opposite also might happen, and the daughter tree might shrink. For example, consider again gene (5) above, and suppose a mutation occurred at position 1, changing the “-” into “a”:

01234567890123456789012
+aba+Qb*ba/abaaaabbaabb	(7)

Its expression results in the following ET:

In this case, the ORF ends at position 2, shortening the original ET in seven nodes.

So, despite their fixed length, each gene has the potential to code for expression trees of different sizes and shapes, where the simplest is composed of only one node (when the first element of a gene is a terminal) and the largest is composed of as many nodes as the length of the gene (when all the elements of the head are functions with maximum arity).

It is evident from the examples above, that any modification made in the genome, no matter how profound, always results in a structurally correct program. Obviously, the structural organization of genes must be preserved, always maintaining the boundaries between head and tail. We will be able to fully appreciate the plasticity of GEP chromosomes in the section Genetic Operators and Evolution where the mechanisms and effects of different genetic operators will be thoroughly analyzed.

Home | Contents | Previous | Next