GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

 

C. FERREIRA 7th Online World Conference on Soft Computing in Industrial Applications, 2002

Function Finding and the Creation of Numerical Constants in Gene Expression Programming

Second Approach: Creation of Numerical Constants from Scratch
 

To solve the sequence induction problem without the facility to manipulate numerical constants, the function set was exactly the same as in the experiment with random constants. The terminal set consisted obviously of the independent variable alone.

As shown in the second column of Table 2, the probability of success using this approach is 81%, considerably higher than the 16% obtained using the facility to manipulate random constants. In this experiment, the first perfect solution was found in generation 44 of run 0 (the sub-ETs are linked by addition):

0123456789012012345678901201234567890120123456789012012345678901201234567890120123456789012
+aa+a-aaaaaaa*+/*/*aaaaaaa*+++**aaaaaaa*+***+aaaaaaa+//++-aaaaaaa+*---*aaaaaaa*a-aa-aaaaaaa

which corresponds to the target sequence (3.2). Note that the algorithm creates all necessary constants from scratch by performing simple mathematical operations.

To find the “V” shaped function without using random constants, the function set is exactly the same as in the first approach. With this collection of functions, most of which extraneous, the algorithm is equipped with different tools for evolving highly accurate models without using numerical constants. The parameters used per run are shown in the fourth column of Table 2. In this experiment of 100 identical runs, the best solution was found in generation 4679 of run 10:

01234567890120123456789012012345678901201234567890120123456789012

+L~*S+aaaaaaa++a+*Saaaaaaa+CEC*+aaaaaaaESaaSaaaaaaaa++EE/*aaaaaaa

(3.6)

It has a fitness of 1990.023 and an R-square of 0.9999313 evaluated over the set of 20 fitness cases and an R-square of 0.9998606 evaluated against the same test set used in the first approach, and thus is better than the model (3.4) evolved with the facility for the manipulation of random constants. More formally, the model (3.6) is expressed by the equation (the contribution of each gene is shown in square brackets):

To predict sunspots without using random numerical constants, the function set is exactly the same as in the first approach. The parameters used per run are shown in the sixth column of Table 2. In this experiment of 100 identical runs, the best solution was found in generation 2273 of run 57:

012345678901234560123456789012345601234567890123456

j+a/+a+*gaafchdci/++-+be+ijdjjaiid/++*ci+-jiabiddhf

(3.7)

It has a fitness of 89176.61 and an R-square of 0.882831 evaluated over the set of 90 fitness cases, and thus is better than the model (3.5) evolved with the facility for the manipulation of random constants. More formally, the model (3.7) is expressed by the equation:

It is instructive to compare the results obtained in both approaches. In all the experiments the explicit use of random constants resulted in a worse performance. In the sequence induction problem, success rates of 81% against 16% were obtained; in the “V” function problem average best-of-run fitnesses of 1931.84 versus 1914.80 and average best-of-run R-squares of 0.995340 versus 0.957255 were obtained; and in the sunspots prediction problem average best-of-run fitnesses of 89033.29 versus 86215.27 and average best-of-run R-squares of 0.811863 versus 0.713365 were obtained (see Table 2). Thus, in real-world applications where complex realities are modeled, of which nothing is known concerning neither the type nor the range of the numerical constants, and where most of the times it is impossible to guess the exact function set, it is more appropriate to let the system model the reality on its own without explicitly using random constants. Not only the results will be better but also the complexity of the system will be much smaller.

Home | Contents | Previous | Next