The Multigenic System with Static Linking

  GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

C. FERREIRA

In N. Nedjah, L. de M. Mourelle, A. Abraham, eds., Genetic Systems Programming: Theory and Experiences, Studies in Computational Intelligence, Vol. 13, pp. 21-56, Springer-Verlag, 2006.

Automatically Defined Functions in Gene Expression Programming

For this analysis we are going to use again both the basic Gene Expression Algorithm without random constants and GEP with random numerical constants. The parameters and the performance of both experiments are shown in Table 2.

Table 2
Settings for the sextic polynomial problem using a multigenic system with (mgGEP-RNC) and without random numerical constants (mgGEP).

	mgGEP	mgGEP-RNC
Number of runs	100	100
Number of generations	200	200
Population size	50	50
Chromosome length	52	80
Number of genes	4	4
Head length	6	6
Gene length	13	20
Linking function	*	*
Terminal set	a	a ?
Function set	+ - * /	+ - * /
Mutation rate	0.044	0.044
Inversion rate	0.1	0.1
RIS transposition rate	0.1	0.1
IS transposition rate	0.1	0.1
Two-point recombination rate	0.3	0.3
One-point recombination rate	0.3	0.3
Gene recombination rate	0.3	0.3
Gene transposition rate	0.1	0.1
Random constants per gene	--	5
Random constants data type	--	Integer
Random constants range	--	0-3
Dc-specific mutation rate	--	0.044
Dc-specific inversion rate	--	0.1
Dc-specific IS transposition rate	--	0.1
Random constants mutation rate	--	0.01
Number of fitness cases	50	50
Selection range	100	100
Precision	0.01	0.01
Success rate	93%	49%

It’s worth pointing out that maximum program length in these experiments is similar to the one used in the unigenic systems of the previous section. Here, head lengths h = 6 and four genes per chromosome were used, giving maximum program length of 52 points (again note that the chromosome length in the systems with random numerical constants is larger on account of the Dc domain, but maximum program length remains the same).

As you can see by comparing Tables 1 and 2, the use of multiple genes resulted in a considerable increase in performance for both systems. In the systems without random constants, by partitioning the genome into four autonomous genes, the performance increased from 26% to 93%, whereas in the systems with random numerical constants, the performance increased from 4% to 49%. Note also that, in this analysis, the already familiar pattern is observed when random numerical constants are introduced: the success rate decreases considerably from 93% to 49% (in the unigenic systems it decreased from 26% to 4%).

Let’s also take a look at the structure of the first perfect solution found using the multigenic system without the facility for the manipulation of random numerical constants (the sub-ETs are linked by multiplication):

0123456789012012345678901201234567890120123456789012
+/aaa/aaaaaaa+//a/aaaaaaaa-a/aaaaaaaa-a/aaaaaaaa	(18)

As its expression shows, it contains three small neutral regions involving a total of nine nodes, all encoding the numerical constant 1. Note also that, in two occasions (in sub-ETs 0 and 1), the numerical constant 1 plays an important role in the overall making of the perfect solution. Also interesting about this perfect solution, is that genes 2 and 3 are exactly the same, suggesting a major event of gene duplication (it’s worth pointing out that the duplication of genes can only be achieved by the concerting action of gene recombination and gene transposition, as a gene duplication operator is not part of the genetic modification arsenal of Gene Expression Programming).

It is also interesting to take a look at the structure of the first perfect solution found using the multigenic system with the facility for the manipulation of random numerical constants (the sub-ETs are linked by multiplication):

01234567890123456789
+--+*aa??aa?a0444212
+--+*aa??aa?a0244422
a?a??a?aaa?a?2212021
aa-a*/?aa????3202123

A₀ = {0, 3, 1, 2, 1}
A₁ = {0, 3, 1, 2, 1}
A₂ = {0, 3, 1, 2, 1}
A₃ = {3, 3, 2, 0, 2}	(19)

As its expression reveals, it is a fairly compact solution with two small neutral motifs plus a couple of neutral nodes, all representing the numerical constant zero. Note that genes 0 and 1 are almost exact copies of one another (there is only variation at positions 17 and 18, but they are of no consequence as they are part of a noncoding region of the gene), suggesting a recent event of gene duplication. Note also that although genes 2 and 3 encode exactly the same sub-ET (a simple sub-ET with just one node), they most certainly followed different evolutionary paths as the homology between their sequences suggests.

Home | Contents | Previous | Next