Buy the Book

  GEP Biblio

  Visit Gepsoft


C. FERREIRA, 2002 (Terms of Use) ISBN: 9729589054

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence

DNA molecules are long, linear strings of four nucleotides (represented by A, T, C, and G). Each DNA molecule is, in fact, a double helix in which one of the strings is the complementary of the other and, thus, adds nothing to the information contained in a single string. In the structure of the double helix, A pairs with T, and C with G (Figure 1.1). The double-stranded, complementary nature of DNA is fundamental for the replication of the genetic information in the cell, but is of little importance in a computer system like GEP or GAs. Indeed, the chromosomes of both GEP and GAs are single-stranded and their replication is done by simple program instructions.

Figure 1.1. Base pairing in the double stranded DNA molecule. Note that the bulkier G and A pair, respectively, with the smaller C and T, putting the two strands exactly the same distance apart. Note also that the information contained in one strand is basically the same contained in the other. In fact, DNA strands are said to be complementary.

The information stored in DNA consists of the sequence of the four nucleotides which is called the primary structure of DNA. The secondary structure of DNA consists of the different kinds of double helixes it can form and, most important to us, DNA lacks a tertiary structure which consists of a unique, three-dimensional arrangement of the molecule. DNA molecules fold, forming random coils. And because complex functionality like catalytic activity is closely related to tertiary structure, DNA molecules are useless for doing much of the work that needs to be done in a cell.

However, the simple DNA molecule is excellent to store information. In the structure of the double helix, the complementary nucleotides face each other and are locked in the interior of the double helix. This makes DNA chemically inert and stable, which are desirable qualities of information keepers. In fact, in the cell, DNA is further protected in the protected environment of the nucleus in eukaryotes or the nucleoid in prokaryotes. But most important to us, is that DNA is incapable of both catalytic activity and structural diversity: first, the potential functional groups (the bases A, T, C and G) are locked in the interior of the helix and, second, the molecule lacks tertiary structure, another prerequisite for catalytic activity and structural diversity.

Simplifying, DNA may be seen as a long string composed of four different letters (A, T, C, and G) in which the sequence of the letters (or primary structure) consists of the genetic information. The genetic information or the blueprints of all organisms on Earth are written on the four letter language of DNA. For instance, the average mammal genome contains about 5 X 109 base pairs (or letters, if only one strand is considered) of DNA and codes for approximately 300,000 protein and RNA genes which are the immediate products of expression of the genome. Below we will see how the genetic information is expressed as proteins.

Home | Contents | Previous | Next