GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

 

C. FERREIRA In J. M. Benitez, O. Cordon, F. Hoffmann, and R. Roy, eds., Advances in Soft Computing: Engineering Design and Manufacturing, pages 257-266, Springer-Verlag, 2003.

Function Finding and the Creation of Numerical Constants in Gene Expression Programming

Setting the System
 
The comparison between the two approaches (with and without the facility to manipulate random constants) was made on three different problems. The first is a problem of sequence induction requiring integer constants. In this case the following test sequence was chosen:

an = 5n4 + 4n3 + 3n2 + 2n + 1

(3.1)

where n consists of the nonnegative integers. This sequence was chosen because it can be exactly solved and therefore can provide an accurate measure of performance in terms of success rate.

The second is a problem of function finding requiring floating-point constants. In this case, the following “V” shaped function was chosen:

y = 4.251a2 + ln(a2) + 7.243ea

(3.2)

where a is the independent variable and e is the irrational number 2.71828183. Problems of this kind cannot be exactly solved by evolutionary algorithms and, therefore, the performance of both approaches is compared in terms of average best-of-run fitness and average best-of-run R-square.

The third is the well-studied benchmark problem of predicting sunspots (Weigend et al. 1992). In this case, 100 observations of the Wolfer sunspots series were used (Table 1) with an embedding dimension of 10 and a delay time of one. Again, the performance of both approaches is compared in terms of average best-of-run fitness and R-square.


Table 1
Wolfer sunspots series (read by rows).

101 82 66 35 31 7 20 92
154 125 85 68 38 23 10 24
83 132 131 118 90 67 60 47
41 21 16 6 4 7 14 34
45 43 48 42 28 10 8 2
0 1 5 12 14 35 46 41
30 24 16 7 4 2 8 17
36 50 62 67 71 48 28 8
13 57 122 138 103 86 63 37
24 11 15 40 62 98 124 96
66 64 54 39 21 7 4 23
55 94 96 77 59 44 47 30
16 7 37 74        


For the sequence induction problem, the first 10 positive integers n and their corresponding term an were used as fitness cases. The fitness function was based on the relative error with a selection range of 20% and maximum precision (0% error), giving maximum fitness fmax = 200 (Ferreira 2001).

For the “V” shaped function problem, a set of 20 random fitness cases chosen from the interval [-1, 1] was used. The fitness function used was also based on the relative error but in this case a selection range of 100% was used, giving fmax = 2,000.

For the time series prediction problem, using an embedding dimension of 10 and a delay time of one, the sunspots series presented in Table 1 result in 90 fitness cases. In this case, a wider selection range of 1,000% was chosen, giving fmax = 90,000.

In all the experiments, the selection was made by roulette-wheel sampling coupled with simple elitism and the performance was evaluated over 100 independent runs. The six experiments are summarized in Table 2.


Table 2
General settings used in the sequence induction (SI), the “V” function, and sunspots (SS) problems. The “*” indicates the explicit use of random constants.

  SI* SI V* V SS* SS
Number of runs 100 100 100 100 100 100
Number of generations 100 100 5000 5000 5000 5000
Population size 100 100 100 100 100 100
Number of fitness cases 10 10 20 20 90 90
Function set + - * / + - * / + - * / L E K ~ S C + - * / L E K ~ S C 4 (+ - * /) 4 (+ - * /)
Terminal set a, ? a a, ? a a - j, ? a - j
Random constants array length 10 -- 10 -- 10 --
Random constants range {0, 1, 2, 3} -- [-1,1] -- [-1,1] --
Head length 6 6 6 6 8 8
Number of genes 7 7 5 5 3 3
Linking function + + + + + +
Chromosome length 140 91 100 65 78 51
Mutation rate 0.044 0.044 0.044 0.044 0.044 0.044
One-point recombination rate 0.3 0.3 0.3 0.3 0.3 0.3
Two-point recombination rate 0.3 0.3 0.3 0.3 0.3 0.3
Gene recombination rate 0.1 0.1 0.1 0.1 0.1 0.1
IS transposition rate 0.1 0.1 0.1 0.1 0.1 0.1
IS elements length 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3
RIS transposition rate 0.1 0.1 0.1 0.1 0.1 0.1
RIS elements length 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3
Gene transposition rate 0.1 0.1 0.1 0.1 0.1 0.1
Random constants mutation rate 0.01 -- 0.01 -- 0.01 --
Dc specific transposition rate 0.1 -- 0.1 -- 0.1 --
Dc specific IS elements length 1,2,3 -- 1,2,3 -- 1,2,3 --
Selection range 20% 20% 100% 100% 1000% 1000%
Precision 0% 0% 0% 0% 0% 0%
Average best-of-run fitness 179.827 197.232 1914.8 1931.84 86215.27 89033.29
Average best-of-run R-square 0.977612 0.999345 0.957255 0.995340 0.713365 0.811863
Success rate 16% 81% -- -- -- --


Home | Contents | Previous | Next