First Approach: Direct Manipulation of Numerical Constants

  GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

C. FERREIRA

7th Online World Conference on Soft Computing in Industrial Applications, 2002

Function Finding and the Creation of Numerical Constants in Gene Expression Programming

To solve the sequence induction problem using random constants, F = {+, -, *}, T = {a, ?}, the set of integer random constants R = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, and “?” ranged over the integers 0, 1, 2, and 3. The parameters used per run are shown in the first column of Table 2. In this experiment, the first perfect solution was found in generation 45 of run 9 (the sub-ETs are linked by addition):

    Gene 0: *-aa+-a?aaa??1742174
        A₀ = {0, 0, 2, 3, 0, 2, 1, 1, 1, 3}

    Gene 1: ++*/+-?aaa???4460170
        A₁ = {3, 0, 2, 2, 1, 3, 1, 0, 0, 1}

    Gene 2: *a**++aa?aa??4101213
        A₂ = {1, 2, 3, 3, 2, 2, 0, 1, 1, 2}

    Gene 3: **+--+?aaa???2637797
        A₃ = {0, 0, 2, 3, 3, 3, 0, 0, 1, 0}

    Gene 4: +?*++?aaaa?a?2890192
        A₄ = {1, 1, 0, 1, 1, 3, 1, 0, 0, 2}

    Gene 5: -+-/*-?aa?a?a8147432
        A₅ = {0, 0, 0, 2, 0, 2, 2, 0, 0, 0}

    Gene 6: **aa**?aa?a??2314518
        A₆ = {0, 2, 3, 2, 3, 1, 3, 2, 3, 0}

which corresponds to the target sequence (3.2).

As shown in the first column of Table 2, the probability of success for this problem is 16%, considerably lower than the 81% of the second approach (see Table 2, column 2). It is worth emphasizing that only the prior knowledge of the solution enabled us, in this case, to choose correctly the type and the range of the random constants.

To find the “V” shaped function using random constants F = {+, -, *, /, L, E, K, ~, S, C} (“L” represents the natural logarithm, “E” represents e^x, “K” represents the logarithm of base 10, “~” represents 10^x, “S” represents the sine function, and “C” represents the cosine) and T = {a, ?}. The set of rational random constants R = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, and “?” ranged over the interval [-1, 1]. The parameters used per run are shown in the third column of Table 2. The best solution, found in run 50 after 4584 generations, is shown below (the sub-ETs are linked by addition):

    Gene 0: L*L*ECaa??a??8534167
        A₀ = {0.189, 0.13, 0.753, 0.548, 0.277, 0.257, 0.743, 0.46, 0.066, 0.801}

    Gene 1: ~S/aC-??aa?aa5477773
        A₁ = {0.337, 0.99, 0.536, 0.406, 0.283, 0.95, 0.968, 0.108, 0.672, 0.644}

    Gene 2: ~*/a*aa???a?a1437777
        A₂ = {0.247, 0.929, 0.779, 0.89, 0.926, 0.24, 0.667, 0.254, 0.518, 0.927}

    Gene 3: -C*?/*a?aaa??4725239
        A₃ = {0.792, 0.019, 0.472, 0.005, 0.682, 0.605, 0.094, 0.357, 0.074, 0.713}

    Gene 4: +E+*EE?a?a???4233680

A₄ = {0.883, 0.768, 0.899, 0.311, 0.981, 0.845, 0.428, 0.308, 0.519, 0.381}

(3.4)

It has a fitness of 1989.566 and an R-square of 0.9997001 evaluated over the set of 20 fitness cases and an R-square of 0.9997185 evaluated against a test set of 100 random points also chosen from the interval [-1, 1]. Mathematically, it corresponds to the following function (the contribution of each sub-ET is indicated in square brackets):

which is a very good approximation to the target function (3.3) as the high value of R-square indicates.

It is worth noticing that the algorithm does in fact integrate constants in the evolved solutions, but the constants are very different from the expected ones. Indeed, GEP (and I believe, all genetic algorithms with tree representations) can find the expected constants with a precision to the third or fourth decimal place when the target functions are simple polynomial functions with rational coefficients and/or when it is possible to guess pretty accurately the function set, otherwise a very creative solution would be found.

To predict sunspots using random numerical constants F = {+, -, *, /}₄ and T = {a, b, c, d, e, f, g, h, i, j, ?}. The set of rational random constants R = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, and “?” ranged over the interval [-1, 1]. The parameters used per run are shown in the fifth column of Table 2. The best solution, found in run 92 after 4759 generations, is shown below (the sub-ETs are linked by addition):

Gene 0: /++j+hjjijg?cfda894833994 A₀ = {0.977, 0.421, 0.226, 0.325, 0.933, 0.204, 0.594, 0.8, 0.212, 0.395} Gene 1: /++b++ag?c?eiejb795620470 A₁ = {0.72, 0.447, 0.266, 0.511, 0.304, 0.247, 0.159, 0.847, 0.204, 0.995} Gene 2: /++jj+jii??f?ig454696802
A₂ = {0.52, 0.595, 0.714, 0.982, 0.987, 0.916, 0.153, 0.779, 0.987, 0.672}	(3.5)

It has a fitness of 86603.2 and an R-square of 0.833714 evaluated over the set of 90 fitness cases. Mathematically, it corresponds to the following function:

Home | Contents | Previous | Next