Monday, 11 October 2010

Fascinating correlations or elegant theories?

From: Posted by Thomas on July 10, 2008

Chris Anderson, Editor-in-Chief of Wired , wrote a few weeks ago a provocative piece "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete", arguing that in our Google-driven data-rich era ("The Petabyte Age") the good old "approach to science —hypothesize, model, test — is becoming obsolete", leaving place to a purely correlative vision of the world. There is a good dose of provocation in the essay and it was quite successful in spurring a flurry of skeptical reactions in the blogosphere, FriendFeed-land and lately in Edge's Reality Club.

I know that it is a bit late to write a post on this but this debate reminds me of the bottom-up vs top-down dialectic in (systems) biology. The tradition in molecular biology has been to focus on molecular mechanisms–a series of molecular events–that explain given biological functions. With detailed knowledge on the properties of an increasing number of components, bottom-up mechanistic descriptions–or models–can be constructed, which account for the experimental observations.

Of course, the purpose of models, at least for insightful ones, is more than merely providing mechanistic descriptions. As William Bialek writes, "Given a progressively more complete microscopic description of proteins and their interactions, how do we understand the emergence of function?" (Aguera y Arcas et al, 2003). There is therefore some subsequent subtle transition from description to insight, from model to theory, from detailed and specific to simple and general (watch Murray Gell-Mann's TEDTalk on "Beauty and truth in physics").

Theories are elegant.

On the other hand, high-throughput technologies (microarrays, proteomics, metabolomics, ultra high throughput sequencing, etc...) are indeed profoundly changing molecular biology and flooding the field with experimental data like never before. Currently, only part of this data can be explained within the context of mechanistic models. Still, and this is probably Chris Anderson's main point, it turns out that if the data is rich enough, one can exploit it by looking at the data globally, from the 'top', to reveal statistical patterns and correlations. Even if there is no mechanistic explanations (yet) for these correlations, they may reveal new worlds, novel structures and detect relationships between processes that were considered before as unlinked.

Correlations are fascinating.

Correlations resulting from data-driven analysis may well in turn stimulate new mechanistic investigations and hopefully new understanding. On Edge, Sean Carroll summarizes it all: "Sometimes it will be hard, or impossible, to discover simple models explaining huge collections of messy data taken from noisy, nonlinear phenomenon. But it doesn't mean we shouldn't try. Hypotheses aren't simply useful tools in some potentially-outmoded vision of science; they are the whole point. Theory is understanding, and understanding our world is what science is all about."

BUT, what is true for fundamental science is not obligatorily a rule for more applied fields, where the priority might less be on understanding than on acting. In particular, in medically related fields, top-down data-driven correlative approaches represent a pragmatic approach to obtain predictive models without waiting for still elusive fully mechanistic models that would encompass the entire complexity of human physiology (Nicholson, 2006).

As often in science, as in other human activities, different but complementary views are championed by people with different temperaments: there are those who like to build an edifice piece by piece and those who want to explore new territories. I think–I hope–that progresses in systems biology on both fronts, top-down and bottom-up, demonstrates that there is no need to turn this complementarity into an opposition.

No comments: