 Karva Notation and K-Expressions

Karva Notation and K-Expressions

Karva notation was developed specifically for Gene Expression Programming (GEP) and consists of a universal way of compactly representing any mathematical or logical expression that can be represented as a tree . Besides its compactness, this universal representation is also linear, and this is a fundamental characteristic for any system that has to breed mathematical expressions to create new, more precise ones.

The linear structures of GEP are called chromosomes and each chromosome contains one or more genes. And each gene is associated with its own K-expression ("K" or "Kappa" is for Karva notation). Genes and K-expressions are very easy to decode. For example, the gene:

 0123456 +/*abcd (1a)

can be represented as a diagram or expression tree (ET): (1b)

The translation of the gene into the corresponding ET is straightforward: The first element in the gene (position 0) corresponds to the root of the ET; then, below that node, are attached as many nodes as there are arguments to that function (two, in this case); then these nodes are filled consecutively with the elements in the gene (in this case, positions 1 and 2), and so forth. The process is repeated until a line composed of only terminals is formed (in this case, the third line).

More formally, both the gene and ET above can be represented by the mathematical expression: (1c)

Usually the genes evolved by GEP are more interesting than the gene presented above, not only with noncoding regions at their ends but also with more diverse branching patterns.

Consider another gene:

 01234567890123456 Q/a*+b-cbabaccbac (2)

where "Q" represents the square root function. This gene has a head (from position 0 through 7 and shown in blue) of length 8 and a tail (from position 8 through 16) of length 9. Note that the head contains both functions (which may take 1, 2,..., n arguments) and terminals (the variables and constants in a problem), whereas the tail contains exclusively terminals (this structural organization is indeed the cornerstone of GEP as it ensures that all the programs encoded in the genes are syntactically correct). The translation of such genes is done exactly as shown for gene (1), giving: Note that, in this case, not all the elements in the gene were used to construct the ET, as the translation ends whenever a line containing only terminals is formed. In this particular case, the gene ends at position 16 whereas the K-expression ends at position 9.

The beauty and elegance of the gene/ET system of Gene Expression Programming by itself is rather amazing, but most importantly it's that this is only the beginning and, indeed, this elegant genotype/phenotype system is at the heart of systems much more complex and also more efficient. In one such system the chromosomes contain more than one gene, and they are obviously called multigenic systems. In these systems, the chromosomes consist of multiple genes, and each gene codes for a sub-ET or sub-program. After translation, the sub-ETs are linked by special functions, the so called linking functions. These functions link the sub-ETs together one after the other in an orderly fashion. For example, the following chromosome composed of three genes (position 0 indicates the beginning of each gene):

 012345678012345678012345678 *aQ+abbaa/Q*/aababa*+Qaabba (3)

codes for the three following sub-ETs: Then the sub-ETs are afterwards linked by one of the available linking functions. For instance, if the linking function were addition, then the following program would be obtained (the linking function is shown in gray): Note that the final program represented in the Figure above could be linearly encoded as a single K-expression:

 01234567890123456 ++a*/aQQ*+/aaabba (4)

However, the use of multigenic chromosomes is more appropriate to evolve solutions to complex problems, for they permit the modular construction of complex, hierarchical structures, where each gene codes for a smaller and simpler building block. These smaller building blocks are separated from one another and are therefore free to evolve independently, allowing for the creation of concrete new units that might prove handy in a new situation.

So, what is the use of Karva notation and why is it so important? The answer is that it allows adaptation or, in other words, learning: you replicate the linear strings of GEP, change a few bits, and you end up with new strings that encode slightly better programs. This is the basis for learning and evolution, and the plasticity of GEP genes with their noncoding regions allows the totally unconstrained exploration of the solution space, therefore increasing the odds of finding very good solutions to the problem at hand. No other programming system is as unconstrained as the genotype/phenotype GEP system in terms of the mechanisms used to create genetic variation (the Genetic Programming system is restricted to an inefficient tree-crossover and it's so complicated structurally that it never even developed beyond a single tree system ).

Furthermore, no other system has as much growth potential as GEP in the sense that the basic GEP system can give rise to other, more complex systems. We've already encountered here two GEP systems: the unigenic system and the multigenic system. But both these systems can be used as the starting point to create yet more complex systems: from simple GEP systems that elegantly handle random numerical constants to more sophisticated systems that can evolve highly complex structures such as neural networks, decision trees, automatically defined functions and polynomial networks. You can learn all about these and other GEP systems in Ferreira's book Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence .

References

1. Ferreira, C., 2001. Gene Expression Programming: A New Adaptive Algorithm for Solving Problems. Complex Systems, Vol. 13, issue 2: 87-129.

2. Koza, J. R., 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press.

3. Ferreira, C., 2006. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. 2nd Edition, Springer-Verlag, Germany.