Since Charles Darwin published The Origin of Species in 1859, his theory of evolution has been a central axiom for the biological sciences. Scientists have devoted considerable effort to unravelling the evolutionary mechanisms of biological, and other, systems. Their work—in particular the discovery of DNA as the carrier of hereditary information—has led to the now nearly universally accepted idea that evolution causes the increasing complexity of most biological systems. However, although the basic mechanisms of evolution—mutation and selection—are clear, accumulating data suggest that the means to increase complexity have also become more complex. This begs the question how such mechanisms have evolved over time.
Thanks to rapid progress in the biological sciences—in particular in the field of genomics—and the information sciences including linguistics, we have amassed an enormous amount of data on genomes and biological evolutionary mechanisms. These are providing an unprecedented opportunity to investigate the possible transformation of evolutionary strategies of evolving systems, such as genomes and languages. In fact, the information contained in genomes has long been compared with languages, and many linguistic methodologies are now used to analyse genomes (Searls, 2002). If we compare the evolution of genomes and language, we serendipitously find that both systems have undergone similar strategic shifts to attain increasing complexity, which suggests that this is an intrinsic property of many evolving systems, not just biological ones.
During the primary stage in the evolution of both genomes and languages, increasing complexity was achieved mainly by increasing the number of basic information-carrying elements: nouns and verbs in language, and genes in biology. The Chinese language is a good example to use in this context, because of its history of more than 6,000 years and its continuing evolution. The early Chinese written language, called Oracle, consisted of a few thousand characters, which gradually increased to include more than 47,000 individual characters, as the ancient Chinese continually invented characters to refine their ability to describe their environment (Table 1; Ji, 1989). For example, more than 70 characters described horses on the basis of their colour, age and gender, and more than 80 characters described their behaviour (Chen, 1936). According to a statistical analysis of 800 million ancient characters, one would need to know about 22,000 characters to attain 99.99% coverage of ancient Chinese (Zhang, 2004). The same phenomenon can be observed in biological evolution. Simple and early organisms, such as prokaryotes, manage to survive and proliferate with a few thousand genes, whereas the number of genes in higher organisms is about a magnitude higher—some plants, such as maize, have more than 50,000 genes (Messing et al, 2004).
In the second stage of their evolution, both systems began to rearrange existing elements into new combinations to increase complexity further. Interestingly, this leads to a decrease in the quantity of elements. For example, modern Chinese uses considerably fewer characters than did the ancient language: only 4,600 characters are now needed to attain 99.99% coverage (Zhang, 1997). However, modern Chinese is definitely more powerful than its ancient predecessor, because it is able to describe a much more complex world. The most frequently used 3,500 characters—covering about 99.87% of modern Chinese—can be combined to form more than 70,000 words (Zhang, 1997), which include the meanings of most ancient characters. Genomes have undergone a similar evolutionary shift. Mammals—such as Homo sapiens and the mouse—have about 25,000 genes (International Human Genome Sequencing Consortium, 2004; Guénet, 2005), which is much fewer than some plants: for example, rice has 43,000 genes (Paterson et al, 2005) and maize has 59,000 (Messing et al, 2004). The greater complexity of mammals is explained partly by the recombination of existing genes, through mechanisms such as alternative splicing (Johnson et al, 2003) and tandem chimerism (Parra et al, 2006). In fact, mammals depend much more on gene recombination to achieve higher complexity than do plants (Messing, 2001).
The third stage of evolution witnesses the arrival of ‘virtual', or modifying, elements, which have an important role as regulatory components. In language, adverbs, auxiliary words, prepositions and conjunctions are all virtual elements, whereas nouns and verbs are the main carriers of information. In fact, the five most frequently used characters in modern Chinese contain two ‘empty' elements (Fig 1). Although virtual words were quite rare in Oracle, they are much more frequent in modern Chinese. Similarly, non-coding RNA has a virtual role in genomes, whereas protein-coding genes are the main carriers of information. In mammals, it is RNA, not protein, that mainly controls gene activity (Mattick, 2004), which also helps to explain the unexpectedly low number of protein-coding genes in mammalian genomes.
In summary, the evolution of both genomes and language has led to increased complexity, but has also developed novel strategies to enhance this process further. Thus, evolution has undergone several paradigm shifts: from increasing the number of basic elements to shuffling existing building blocks into new combinations, and finally to creating virtual regulatory elements. These strategic shifts improve the overall complexity and performance of the systems at a relatively low cost: the ability to combine elements reduces the number of building blocks needed, and virtual elements simplify regulation. Using RNA rather than proteins as regulators circumvents protein translation, and the average length of such RNA signals—22 nucleotides—is almost two orders of magnitude shorter than that required to encode an average protein (Mattick, 2004). It is therefore reasonable to propose that this “evolution of evolutionary strategies” is an intrinsic property of evolving systems. It will be a great challenge to provide further proof—for example, by establishing a model to simulate such higher-level evolution—especially in silico.