'What it means to be
98% chimpanzee' is the title of a book by biologist Jonathan Marks. His aim is to
make people aware of the fact that what we can learn from gene sequences is
limited. The limitation is mostly to knowing what proteins can be made by an
organism, and to some extent in estimating how genes are ‘networked’. In short,
he exposes the fallacy of giving exaggerated importance to gene sequence
information. No wonder that lay people are unaware of the fallacy: even the
scientific community keeps falling into the trap. It is important to pause and reflect on what
people like Jonathan Marks are saying.
As
taught in school - and perhaps more effectively by films like Jurrassic Park -
DNA is made up of sugars, phosphates and four different nitrogenous bases that
are abbreviated as A,T, G and C. The sugars and phosphates provide a backbone
to hold the bases. They are irrelevant for understanding how DNA encodes
information. It is the arrangement of the four bases in various combinations that
decides what protein (if any) corresponds to a sequence of DNA. In the 1970's a
scientist by name Frederick Sanger developed a technique by which DNA could be
'read', meaning that the precise sequence in which A, T, G and C occurred in a
given sample could be determined. Today we have available a host of techniques
to sequence all of the DNA in an organism, which is known as its 'genome'. The
ability to do so and to manipulate DNA has certainly expanded our understanding
of the nature of living organisms. However, does this mean that until DNA
sequencing came on the scene we knew nothing about how living systems work at
the molecular level? The answer is a clear no. In fact the foundation for what
is known as molecular biology today was largely laid by meticulous genetics
done in the 1940's to 60's, when there was no way of inferring the exact
sequence of long stretches of DNA (and for much of the time without a clear
appreciation of the central role of DNA). However, in recent times the hype
surrounding gene sequencing has exaggerated the information one gets from it
and the value of such information when we do have it at hand. Let us look at
the facts.
A
DNA sequence that is called a gene is said to be a ‘coding sequence’ because the
sequence of bases carry coded information to make a protein or an RNA molecule that
performs a specific function (All proteins are made from RNA molecules called
'messenger' RNAs, but other kinds of RNAs perform functions on their own, and
do not need to be converted to protein). The sequence can be compared to a
meaningful sentence in a language. With some degree of success, biologists can
identify where (i.e. with what base) a sentence begins and where it ends. The
rest of the DNA that does not make up readable sentences is said to be 'non-coding'.
As far as we know it does not code for a protein or RNA molecule. But most of
the time we have no idea what it DOES do. Some such sequences previously
thought of as 'non coding' have led to the discovery of several new kinds of RNA
molecules a that participate in regulation or co-ordinating the functions of
genes. However, non-coding DNA can be quite a lot. For example, a staggering 95
% of the human genome belongs to this non-coding category. Just 5 % of our DNA
seems to be making protein or RNA. Further investigations may raise the figure.
But unless our ideas are fundamentally flawed (and as of today we do not see
how that could be), it appears unlikely that more than 10% of our genome could
code for proteins or RNA. Additionally, in multicellular organisms (like
humans), the coding sequences themselves are frequently rearranged in a process
called 'splicing', which means that the same stretch of DNA could be combined
in many ways to make the final protein. Some genes are present across most life
forms, and are referred to as 'conserved'. They can then be used to look at how
organisms have changed over time in that specific respect- for instance a gene
that helps in respiration called 'cytochrome C' is a popular choice. Because
the genome sequences of so many organisms are now available, it is possible to
compare specific sections across genomes and use this as one estimate of how
close to each other the two organisms are. However, when we compare two DNA
sequences, it is important to compare the right bits, and to be conservative in
drawing conclusions about similarity.
For
example, if you wanted to compare a human genome with the genome of an insect,
say a fruit fly, the information is now available. A fruit fly genome is 60%
similar to that of a human. Does that mean a human being is 60% fruit fly? It
is clear that this is an absurd conclusion to draw. However, as the similarity
draws closer, as for instance with the great apes (which includes chimpanzees)
people quite frequently use the argument that 'chimpanzees are 98% human' in
fighting for animal rights or even trying to get across a scientific point. It
did not take gene sequencing to see that gorillas, chimpanzees and orang-utans
look more like human beings than any other animal. In fact the common name of
the orang-utan itself translates to 'old man of the forest'. This is not to say
that we must not protect the great apes or place appropriate restrictions on
their use in experiments where one would not use humans. Indeed everything we
have learnt about them suggests that they display an emotional and intellectual
sensitivity well above most other animals. However, it is scientifically
inaccurate to translate genetic similarity to an all encompassing overall
similarity. When two stretches of DNA are compared, they are lined up next to
each other and the base at every position is compared. Two short sequences can
be 0% similar, for instance CCGAT is completely different from GACTA. However,
as there are only four bases in DNA, for large sequences, there is a 25% chance
that the same base will occur at the same position in both stretches of DNA. In
other words, a 0% similarity is really a 25% similarity. This means that at the
lower end differences are over-estimated, and as the numbers climb higher,
differences are under-estimated.
As
Marks points out, humans are not 98% chimpanzees. We might add, nor are they
60% fruit flies or 30% daffodils. While data from gene sequences is undeniably
useful, we are yet to fully understand HOW best to interpret or use it. More
so, many of the promises held out by the knowledge of sequence information has
not materialized. Ten years after the human genome was sequenced, at this point
it remains more an impressive technical achievement than a scientific advance
in knowledge. Two major expectations have been disproved. The number of human
genes expected was far more than the number actually found to exist (~25,000 vs
~100,000), already indicating that gene numbers alone were unlikely to define a
unique identity. Secondly, the expectation that knowing the sequences could
lead to quick diagnosis and prevention of several diseases has not
materialized. We now have a lot of information. The trouble is sorting and
understanding it. Even if (as the current field of 'proteomics' replaces the
older one of 'genomics') we knew the function of every gene there is, it
wouldn’t lead us to an instant understanding of how the organism is made or
operates. The idea that breaking down things to their most fundamental parts
will reveal how the whole works (often called 'reductionism' in biology) does
not necessarily work with complex systems, and living organisms are as complex
as things can get. Their function depends on an intricate interplay between
genes, the physical environment, social inputs and so on. Knowledge of DNA
sequences remains useful and important. But because of the pitfalls listed
above in sequence comparisons and in going from sequence to function, it is only
one part of what we need to understand how life works. Even
if it held true to all available comparison methods, the statement ‘Human DNA
and chimpanzee DNA are 98% similar’ would tell us very little about what it
means to be a human or a chimpanzee.
Well written and well explained Laasya, the science comes out very well indeed - in clear, lucid and easy-to-understand language - ideal pop-sci article for the masses! Kudos! Keep writing :)
ReplyDeleteLove n best wishes,
Maya
This comment has been removed by the author.
ReplyDeleteHey, a good book (and a good thought, too) to highlight. This extrapolating gene sequences has gone too far (especially by the media). I don't think I'm 98% chimp (am sure the chimp doesn't think so either!)Maybe when the analysis of data from Encode is complete, we'd know why we're not, despite the sequence similarity. (Or perhaps a whole new set of knowledge is needed, on gene clusters/networks or things so far unknown...)
ReplyDeleteInteresting review. Makes me want to read the book. I like the last statement!
ReplyDelete