Sunday 22 August 2010

Defining and measuring complexity

What's complexity? For example, is the human genome more complex than the yeast genome (see my post on August 8th, 2010)? We intuitively answer this question with a big "OF COURSE". However, it has been surprisingly difficult to come up with an universally accepted definition of complexity. Although there is not yet a single science of complexity but rather several different sciences of complexity with different views about what complexity really means, the history of science shows us that the lack of an universally accepted definition of a central term in a new scientific field is more common than not. As an example, the modern genetics still does not have a good definition of gene at the molecular level.

The physicist Seth Lloyd proposed in 2001 three different dimensions along which to measure the complexity of a system:

1) How hard is it to describe?

2) How hard is it to create?

3) What is its degree of organization?

Another interesting proposed measure of complexity is the Shannon entropy, defined as the average information or "amount of surprise" a message source has for a receiver. Thus, using a classical example of genetics, we could say that the sequence CGTGGT has more entropy than the sequence AAAAAA and, therefore is more complex than the latter one. A completely random sequence has the maximum possible entropy. That means we could well make up an artificial genome by choosing a bunch of random As, Cs, Ts, and Gs. Using entropy as the measure of complexity, this random, almost certainly nonfunctional genome would be considered more complex than the human genome.

In conclusion: the most complex entities are not the most ordered or random ones but somewhere in between.

For further reading, see "Complexity: a guided tour" by M. Mitchell.

No comments: