In credit screening the goal is to determine whether to approve or not a customer’s request for a credit card. Each sample in the dataset represents a real credit card application and the output describes whether the bank granted the credit card or not. This problem has 51 input attributes, all of them unexplained in the original dataset for confidentiality reasons.
The model presented here was obtained using the card1 dataset of PROBEN1 where the binary
1-of-m encoding was again replaced by a 1-bit encoding (“1” for approval and “0” for non-approval). The first 345 examples were used for training and the last 172 were used for testing.
For this problem, the function set F = {+, -, *}, in which each function was weighted 15 times and the set of terminals included all the 51 attributes which were represented by
d0 - d50. The 0/1 rounding threshold was equal to 1.0 and the fitness was evaluated by equation
(4.28).
For this problem, chromosomes composed of five genes with an h = 5 and sub-ETs linked by addition were used. The program below was discovered after 311 generations of a small population of 50 individuals:
**+d47-d11d15d27d44d13d4
**+d47-d11d18d29d48d12d38
*d11d40+d32d17d46d0d44d13d7
-d41+*+d18d48d29d4d10d45 |
|
*-d11d40*d0d28d28d48d0d43 |
(4.30a) |
It has a fitness of 298 evaluated against the training set of 345 fitness cases and a fitness of 151 evaluated against the testing set of 172 examples. This corresponds to a testing set classification error of 12.209% and a classification accuracy of 87.791%. Below is shown the fully expressed individual translated into a C++ function automatically generated with
APS:
double APSCfunction(double d[ ])
{
double
dblTemp = 0;
dblTemp
+= ((d[47]*(d[27]-d[44]))*(d[11]+d[15]));
dblTemp
+= ((d[47]*(d[29]-d[48]))*(d[11]+d[18]));
dblTemp
+= (d[11]*d[40]);
dblTemp
+= (d[41]-((d[18]*d[48])+(d[29]+d[4])));
dblTemp
+= ((d[40]-(d[0]*d[28]))*d[11]);
return (dblTemp >= 1 ? 1 : 0); |
|
} |
(4.30b) |
Note that not all the attributes integrate the model evolved by GEP and are apparently irrelevant to the decision at hand. In fact, of the 51 attributes only 13 are used in this extremely accurate model.
|