Two approaches to the problem of constant creation

  Buy the Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

ISBN: 9729589054

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence

In this section:

In this section we are going to analyze two different approaches to the problem of constant creation in symbolic regression by comparing the performance of two different algorithms. The first uses the facility to manipulate random constants directly and the second does not include this facility. The comparison between the two approaches will be made on three different problems. The first is an artificial problem of sequence induction requiring integer constants; the second is a problem of function finding requiring floating-point constants; and the third is a real-world time series prediction problem also requiring floating-point constants.

For the sequence induction problem, the following test sequence was chosen:

a_n = 4n⁴ + 3n³ + 2n² + n

(4.9)

where n consists of the nonnegative integers. This sequence was chosen because it can be exactly solved by both algorithms and therefore can provide an accurate measure of their performance in terms of success rate.

For the function finding problem, the following “V” shaped function was chosen:

y = 4.251a² + ln(a²) + 7.243e^a

(4.10)

where a is the independent variable and e is the irrational number 2.71828183. Problems of this kind cannot be exactly solved by evolutionary algorithms and, therefore, the performance of both approaches will be compared in terms of average best-of-run fitness and average best-of-run R-square.

For the time series prediction task, 100 observations of the Wolfer sunspots series were used (Table 4.5) with an embedding dimension of 10 and a delay time of one (see section 4.4 for more details). Once again, the performance of both approaches will be compared in terms of average best-of-run fitness and R-square.

Table 4.5
Wolfer sunspots series (read by rows).

101	82	66	35	31	7	20	92
154	125	85	68	38	23	10	24
83	132	131	118	90	67	60	47
41	21	16	6	4	7	14	34
45	43	48	42	28	10	8	2
0	1	5	12	14	35	46	41
30	24	16	7	4	2	8	17
36	50	62	67	71	48	28	8
13	57	122	138	103	86	63	37
24	11	15	40	62	98	124	96
66	64	54	39	21	7	4	23
55	94	96	77	59	44	47	30
16	7	37	74

For the sequence induction problem, the first 10 positive integers n and their corresponding term were used as fitness cases (Table 4.6). The fitness function was based on the relative error and the fitness was evaluated by equation (3.1b). A selection range of 25% and maximum precision (0% error) were chosen, giving f_max = 250. This experiment, with its two different approaches, is summarized in Table 4.7.

Table 4.6
Set of fitness cases for the sequence induction task.

n	*a_n*
1	10
2	98
3	426
4	1252
5	2930
6	5910
7	10738
8	18056
9	28602
10	43210

Table 4.7
General settings used in the sequence induction problem with (SI*) and without (SI) random constants.

	SI*	SI
Number of runs	100	100
Number of generations	100	100
Population size	100	100
Number of fitness cases	10 (Table 4.6)	10 (Table 4.6)
Function set	+ - * /	+ - * /
Terminal set	a ?	a
Random constants array length	10	--
Random constants range	{0, 1, 2, 3}	--
Head length	6	6
Number of genes	5	5
Linking function	+	+
Chromosome length	100	65
Mutation rate	0.044	0.044
One-point recombination rate	0.3	0.3
Two-point recombination rate	0.3	0.3
Gene recombination rate	0.1	0.1
IS transposition rate	0.1	0.1
IS elements length	1,2,3	1,2,3
RIS transposition rate	0.1	0.1
RIS elements length	1,2,3	1,2,3
Gene transposition rate	0.1	0.1
Random constants mutation rate	0.01	--
Dc specific transposition rate	0.1	--
Dc specific IS elements length	1,2,3	--
Selection range	25%	25%
Precision	0%	0%
Average best-of-run fitness	195.308	249.982
Average best-of-run R-square	0.798698299	0.9999999996
Success rate	24%	98%

For the “V” shaped function problem, a set of 20 random fitness cases chosen from the interval [-1, 1] was used (Table 4.8). The fitness function was also evaluated by equation (3.1b), but in this case a selection range of 100% was used, giving f_max = 2000. This experiment, with its two different approaches, is summarized in Table 4.9.

Table 4.8
Set of fitness cases used in the “V” function problem.

a	f(a)
-0.2639725157548	3.19498066265276
0.0578905532656938	1.99052001725998
0.334025290109634	8.39663703997286
-0.236334577564462	3.07088976972825
-0.855744382566804	5.87946763695703
-0.0194437136332785	-0.775326322328458
-0.192134388183304	2.83470225774408
0.529307910124627	12.2154726642137
-0.00788974118728459	-2.49803983418635
0.438969804950631	10.4071734858808
-0.107559292698039	2.09413635645908
-0.274556994377163	3.23927278010839
-0.0595333219604528	1.19701284767347
0.384492993958352	9.35580769189855
-0.874923020736333	6.00642453001302
-0.236546636250546	3.07189729043837
-0.167875941704557	2.67440053130986
0.950682181822091	22.4819639844149
0.946979159577362	22.3750161187355
0.639339910059591	14.5701285332337

Table 4.9
General settings used in the “V” function problem with (V*) and without (V) random constants.

	V*	V
Number of runs	100	100
Number of generations	5000	5000
Population size	100	100
Number of fitness cases	20 (Table 4.8)	20 (Table 4.8)
Function set	+ - * / L E K ~ S C	+ - * / L E K ~ S C
Terminal set	a, ?	a
Random constants array length	10	--
Random constants range	[-1,1]	--
Head length	6	6
Number of genes	5	5
Linking function	+	+
Chromosome length	100	65
Mutation rate	0.044	0.044
One-point recombination rate	0.3	0.3
Two-point recombination rate	0.3	0.3
Gene recombination rate	0.1	0.1
IS transposition rate	0.1	0.1
IS elements length	1,2,3	1,2,3
RIS transposition rate	0.1	0.1
RIS elements length	1,2,3	1,2,3
Gene transposition rate	0.1	0.1
Random constants mutation rate	0.01	--
Dc specific transposition rate	0.1	--
Dc specific IS elements length	1,2,3	--
Selection range	100%	100%
Precision	0%	0%
Average best-of-run fitness	1896.25	1953.057
Average best-of-run R-square	0.95129456	0.99647004

For the time series prediction problem, using an embedding dimension of 10 and a delay time of one, the sunspots series presented in Table 4.5 result in 90 fitness cases (see section 4.4 for more details). In this case, a wider selection range of 1000% was chosen, giving f_max = 90,000. This experiment, with its two different approaches, is summarized in Table 4.10.

Table 4.10
General settings used in the sunspots prediction task with (SS*) and without (SS) random constants.

	SS*	SS
Number of runs	100	100
Number of generations	5000	5000
Population size	100	100
Number of fitness cases	90 (Table 4.5)	90 (Table 4.5)
Function set	4 (+ - * /)	4 (+ - * /)
Terminal set	a - j, ?	a - j
Random constants array length	10	--
Random constants range	[-1,1]	--
Head length	7	7
Number of genes	3	3
Linking function	+	+
Chromosome length	69	45
Mutation rate	0.044	0.044
One-point recombination rate	0.3	0.3
Two-point recombination rate	0.3	0.3
Gene recombination rate	0.1	0.1
IS transposition rate	0.1	0.1
IS elements length	1,2,3	1,2,3
RIS transposition rate	0.1	0.1
RIS elements length	1,2,3	1,2,3
Gene transposition rate	0.1	0.1
Random constants mutation rate	0.01	--
Dc specific transposition rate	0.1	--
Dc specific IS elements length	1,2,3	--
Selection range	1000%	1000%
Precision	0%	0%
Average best-of-run fitness	86182.05	89009.66
Average best-of-run R-square	0.706437	0.801144

Home | Contents | Previous | Next