Setting the System

  GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

C. FERREIRA

In J. M. Benitez, O. Cordon, F. Hoffmann, and R. Roy, eds., Advances in Soft Computing: Engineering Design and Manufacturing, pages 257-266, Springer-Verlag, 2003.

Function Finding and the Creation of Numerical Constants in Gene Expression Programming

Setting the System

The comparison between the two approaches (with and without the facility to manipulate random constants) was made on three different problems. The first is a problem of sequence induction requiring integer constants. In this case the following test sequence was chosen:

a_n = 5n⁴ + 4n³ + 3n² + 2n + 1

(3.1)

where n consists of the nonnegative integers. This sequence was chosen because it can be exactly solved and therefore can provide an accurate measure of performance in terms of success rate.

The second is a problem of function finding requiring floating-point constants. In this case, the following “V” shaped function was chosen:

y = 4.251a² + ln(a²) + 7.243e^a

(3.2)

where a is the independent variable and e is the irrational number 2.71828183. Problems of this kind cannot be exactly solved by evolutionary algorithms and, therefore, the performance of both approaches is compared in terms of average best-of-run fitness and average best-of-run R-square.

The third is the well-studied benchmark problem of predicting sunspots (Weigend et al. 1992). In this case, 100 observations of the Wolfer sunspots series were used (Table 1) with an embedding dimension of 10 and a delay time of one. Again, the performance of both approaches is compared in terms of average best-of-run fitness and R-square.

Table 1
Wolfer sunspots series (read by rows).

101	82	66	35	31	7	20	92
154	125	85	68	38	23	10	24
83	132	131	118	90	67	60	47
41	21	16	6	4	7	14	34
45	43	48	42	28	10	8	2
0	1	5	12	14	35	46	41
30	24	16	7	4	2	8	17
36	50	62	67	71	48	28	8
13	57	122	138	103	86	63	37
24	11	15	40	62	98	124	96
66	64	54	39	21	7	4	23
55	94	96	77	59	44	47	30
16	7	37	74

For the sequence induction problem, the first 10 positive integers n and their corresponding term a_n were used as fitness cases. The fitness function was based on the relative error with a selection range of 20% and maximum precision (0% error), giving maximum fitness f_max= 200 (Ferreira 2001).

For the “V” shaped function problem, a set of 20 random fitness cases chosen from the interval [-1, 1] was used. The fitness function used was also based on the relative error but in this case a selection range of 100% was used, giving f_max= 2,000.

For the time series prediction problem, using an embedding dimension of 10 and a delay time of one, the sunspots series presented in Table 1 result in 90 fitness cases. In this case, a wider selection range of 1,000% was chosen, giving f_max= 90,000.

In all the experiments, the selection was made by roulette-wheel sampling coupled with simple elitism and the performance was evaluated over 100 independent runs. The six experiments are summarized in Table 2.

Table 2
General settings used in the sequence induction (SI), the “V” function, and sunspots (SS) problems. The “*” indicates the explicit use of random constants.

	SI*	SI	V*	V	SS*	SS
Number of runs	100	100	100	100	100	100
Number of generations	100	100	5000	5000	5000	5000
Population size	100	100	100	100	100	100
Number of fitness cases	10	10	20	20	90	90
Function set	+ - * /	+ - * /	+ - * / L E K ~ S C	+ - * / L E K ~ S C	4 (+ - * /)	4 (+ - * /)
Terminal set	a, ?	a	a, ?	a	a - j, ?	a - j
Random constants array length	10	--	10	--	10	--
Random constants range	{0, 1, 2, 3}	--	[-1,1]	--	[-1,1]	--
Head length	6	6	6	6	8	8
Number of genes	7	7	5	5	3	3
Linking function	+	+	+	+	+	+
Chromosome length	140	91	100	65	78	51
Mutation rate	0.044	0.044	0.044	0.044	0.044	0.044
One-point recombination rate	0.3	0.3	0.3	0.3	0.3	0.3
Two-point recombination rate	0.3	0.3	0.3	0.3	0.3	0.3
Gene recombination rate	0.1	0.1	0.1	0.1	0.1	0.1
IS transposition rate	0.1	0.1	0.1	0.1	0.1	0.1
IS elements length	1,2,3	1,2,3	1,2,3	1,2,3	1,2,3	1,2,3
RIS transposition rate	0.1	0.1	0.1	0.1	0.1	0.1
RIS elements length	1,2,3	1,2,3	1,2,3	1,2,3	1,2,3	1,2,3
Gene transposition rate	0.1	0.1	0.1	0.1	0.1	0.1
Random constants mutation rate	0.01	--	0.01	--	0.01	--
Dc specific transposition rate	0.1	--	0.1	--	0.1	--
Dc specific IS elements length	1,2,3	--	1,2,3	--	1,2,3	--
Selection range	20%	20%	100%	100%	1000%	1000%
Precision	0%	0%	0%	0%	0%	0%
Average best-of-run fitness	179.827	197.232	1914.8	1931.84	86215.27	89033.29
Average best-of-run R-square	0.977612	0.999345	0.957255	0.995340	0.713365	0.811863
Success rate	16%	81%	--	--	--	--

Home | Contents | Previous | Next