Tools for mining knowledge from data are crucial in a world where data is constantly increasing. The quantity of data is so big that to find the meaningful factors in the sea of data becomes a Herculean task and new technologies have been developed to extract relevant knowledge from data. Gene expression programming is one of these emerging technologies and is ideal for separating the wheat from chaff. In this section we are going to illustrate this with a function finding problem where nine out of 10 variables are meaningless.
The test function is the already familiar function of section
4.1.1, with the difference that the meaningful parameter is to be discovered among a total of 10 variables. In
Table 4.4 are summarized the parameters used per run in this experiment. As the high success rate shows (77%), GEP was not overwhelmed by the quantity of irrelevant data and found its way very efficiently. The first perfect solution was found in generation 61 of run 0. Its chromosome is shown below (the sub-ETs are linked by addition):
01234567890120123456789012012345678901201234567890120123456789012 |
|
*a*aa-hgadadc-ah*d-gcfjcbd/--gcgciijeegh+eeehbeddbfd*aadaabcecfgb |
(4.7) |
where a represents the meaningful variable and b-j represent the remaining meaningless variables. As its expression shows, this chromosome encodes a function equivalent to the target function
(4.1).
Table 4.4
Settings used in the 10-dimensional data mining problem.
Number
of runs |
100 |
Number
of generations |
1000 |
Population
size |
50 |
Number
of fitness cases |
100 |
Function
set |
+
- * / |
Terminal
set |
a
b c d e f g h i j |
Head
length |
6 |
Number
of genes |
5 |
Linking
function |
+ |
Chromosome
length |
65 |
Mutation
rate |
0.044 |
One-point
recombination rate |
0.3 |
Two-point
recombination rate |
0.3 |
Gene
recombination rate |
0.1 |
IS
transposition rate |
0.1 |
IS
elements length |
1,2,3 |
RIS
transposition rate |
0.1 |
RIS
elements length |
1,2,3 |
Gene
transposition rate |
0.1 |
Selection
range |
100% |
Precision |
0.01% |
Success
rate |
77% |
|