Monday, August 5, 2013

What it means to be 98% chimpanzee (Deccan herald, 6th August 2013)



'What it means to be 98% chimpanzee' is the title of a book by biologist Jonathan Marks. His aim is to make people aware of the fact that what we can learn from gene sequences is limited. The limitation is mostly to knowing what proteins can be made by an organism, and to some extent in estimating how genes are ‘networked’. In short, he exposes the fallacy of giving exaggerated importance to gene sequence information. No wonder that lay people are unaware of the fallacy: even the scientific community keeps falling into the trap.  It is important to pause and reflect on what people like Jonathan Marks are saying.
As taught in school - and perhaps more effectively by films like Jurrassic Park - DNA is made up of sugars, phosphates and four different nitrogenous bases that are abbreviated as A,T, G and C. The sugars and phosphates provide a backbone to hold the bases. They are irrelevant for understanding how DNA encodes information. It is the arrangement of the four bases in various combinations that decides what protein (if any) corresponds to a sequence of DNA. In the 1970's a scientist by name Frederick Sanger developed a technique by which DNA could be 'read', meaning that the precise sequence in which A, T, G and C occurred in a given sample could be determined. Today we have available a host of techniques to sequence all of the DNA in an organism, which is known as its 'genome'. The ability to do so and to manipulate DNA has certainly expanded our understanding of the nature of living organisms. However, does this mean that until DNA sequencing came on the scene we knew nothing about how living systems work at the molecular level? The answer is a clear no. In fact the foundation for what is known as molecular biology today was largely laid by meticulous genetics done in the 1940's to 60's, when there was no way of inferring the exact sequence of long stretches of DNA (and for much of the time without a clear appreciation of the central role of DNA). However, in recent times the hype surrounding gene sequencing has exaggerated the information one gets from it and the value of such information when we do have it at hand. Let us look at the facts.
A DNA sequence that is called a gene is said to be a ‘coding sequence’ because the sequence of bases carry coded information to make a protein or an RNA molecule that performs a specific function (All proteins are made from RNA molecules called 'messenger' RNAs, but other kinds of RNAs perform functions on their own, and do not need to be converted to protein). The sequence can be compared to a meaningful sentence in a language. With some degree of success, biologists can identify where (i.e. with what base) a sentence begins and where it ends. The rest of the DNA that does not make up readable sentences is said to be 'non-coding'. As far as we know it does not code for a protein or RNA molecule. But most of the time we have no idea what it DOES do. Some such sequences previously thought of as 'non coding' have led to the discovery of several new kinds of RNA molecules a that participate in regulation or co-ordinating the functions of genes. However, non-coding DNA can be quite a lot. For example, a staggering 95 % of the human genome belongs to this non-coding category. Just 5 % of our DNA seems to be making protein or RNA. Further investigations may raise the figure. But unless our ideas are fundamentally flawed (and as of today we do not see how that could be), it appears unlikely that more than 10% of our genome could code for proteins or RNA. Additionally, in multicellular organisms (like humans), the coding sequences themselves are frequently rearranged in a process called 'splicing', which means that the same stretch of DNA could be combined in many ways to make the final protein. Some genes are present across most life forms, and are referred to as 'conserved'. They can then be used to look at how organisms have changed over time in that specific respect- for instance a gene that helps in respiration called 'cytochrome C' is a popular choice. Because the genome sequences of so many organisms are now available, it is possible to compare specific sections across genomes and use this as one estimate of how close to each other the two organisms are. However, when we compare two DNA sequences, it is important to compare the right bits, and to be conservative in drawing conclusions about similarity.
For example, if you wanted to compare a human genome with the genome of an insect, say a fruit fly, the information is now available. A fruit fly genome is 60% similar to that of a human. Does that mean a human being is 60% fruit fly? It is clear that this is an absurd conclusion to draw. However, as the similarity draws closer, as for instance with the great apes (which includes chimpanzees) people quite frequently use the argument that 'chimpanzees are 98% human' in fighting for animal rights or even trying to get across a scientific point. It did not take gene sequencing to see that gorillas, chimpanzees and orang-utans look more like human beings than any other animal. In fact the common name of the orang-utan itself translates to 'old man of the forest'. This is not to say that we must not protect the great apes or place appropriate restrictions on their use in experiments where one would not use humans. Indeed everything we have learnt about them suggests that they display an emotional and intellectual sensitivity well above most other animals. However, it is scientifically inaccurate to translate genetic similarity to an all encompassing overall similarity. When two stretches of DNA are compared, they are lined up next to each other and the base at every position is compared. Two short sequences can be 0% similar, for instance CCGAT is completely different from GACTA. However, as there are only four bases in DNA, for large sequences, there is a 25% chance that the same base will occur at the same position in both stretches of DNA. In other words, a 0% similarity is really a 25% similarity. This means that at the lower end differences are over-estimated, and as the numbers climb higher, differences are under-estimated.
As Marks points out, humans are not 98% chimpanzees. We might add, nor are they 60% fruit flies or 30% daffodils. While data from gene sequences is undeniably useful, we are yet to fully understand HOW best to interpret or use it. More so, many of the promises held out by the knowledge of sequence information has not materialized. Ten years after the human genome was sequenced, at this point it remains more an impressive technical achievement than a scientific advance in knowledge. Two major expectations have been disproved. The number of human genes expected was far more than the number actually found to exist (~25,000 vs ~100,000), already indicating that gene numbers alone were unlikely to define a unique identity. Secondly, the expectation that knowing the sequences could lead to quick diagnosis and prevention of several diseases has not materialized. We now have a lot of information. The trouble is sorting and understanding it. Even if (as the current field of 'proteomics' replaces the older one of 'genomics') we knew the function of every gene there is, it wouldn’t lead us to an instant understanding of how the organism is made or operates. The idea that breaking down things to their most fundamental parts will reveal how the whole works (often called 'reductionism' in biology) does not necessarily work with complex systems, and living organisms are as complex as things can get. Their function depends on an intricate interplay between genes, the physical environment, social inputs and so on. Knowledge of DNA sequences remains useful and important. But because of the pitfalls listed above in sequence comparisons and in going from sequence to function, it is only one part of what we need to understand how life works. Even if it held true to all available comparison methods, the statement ‘Human DNA and chimpanzee DNA are 98% similar’ would tell us very little about what it means to be a human or a chimpanzee.

4 comments:

  1. Well written and well explained Laasya, the science comes out very well indeed - in clear, lucid and easy-to-understand language - ideal pop-sci article for the masses! Kudos! Keep writing :)
    Love n best wishes,
    Maya

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hey, a good book (and a good thought, too) to highlight. This extrapolating gene sequences has gone too far (especially by the media). I don't think I'm 98% chimp (am sure the chimp doesn't think so either!)Maybe when the analysis of data from Encode is complete, we'd know why we're not, despite the sequence similarity. (Or perhaps a whole new set of knowledge is needed, on gene clusters/networks or things so far unknown...)

    ReplyDelete
  4. Interesting review. Makes me want to read the book. I like the last statement!

    ReplyDelete