Who Put it There? Information in DNA.

Among the claims that surface with the regularity of a pulsar are the claims that DNA is a code and, as such, requires the intervention of an intelligence to 'put the information there'. In this post, I want to give a brief treatment of that claim, and show why it doesn't stack up.

First, we require a definition of information that is robust. Now, there are two robust formulations of information theory, and both of them need to be considered. The first is that of Claude Shannon and, while this is the formulation that most creationists will cite, largely due to apologist screeds erecting various claims about information having to contain some sort of message and therefore requiring somebody to formulate the message, it doesn't robustly apply to DNA, because it's the wrong treatment of information. Indeed, when dealing with complexity in information, you MUST use Kolmogorov, because that's the one that deals with complexity.

So just what is information? Well, in Shannon theory, information can be defined as 'reduction in uncertainty'. Shannon theory deals with fidelity in signal transmission and reception, since Shannon worked in communications. Now, given this, we have a maximum information content, defined as the lowest possible uncertainty. 

Now, if we have a signal, say a TV station, and your TV is perfectly tuned, and there is no noise added between transmission and reception of the TV signal, then you receive the channel cleanly and the information content is maximal. If, however, the TV is tuned slightly off the channel, or your reception is in some other respect less than brilliant, you get noise in the channel. The older ones of you will remember pre-digital television in which this was manifest in the form of 'bees' in the picture, and crackling and noise in the audio. Nowadays, you tend to get breaks in the audio, and pixelated blocks in the picture. They amount to the same thing, namely noise, or 'an increase in uncertainty'. It tells us that any deviation from the maximal information content, which is a fixed quantity, constitutes degradation of the information source, or 'Shannon entropy'. Shannon actually chose this term because the equation describing his 'information entropy' is almost identical to the Gibbs/Boltzmann equation for statistical entropy, as used in statistical mechanics. See my earlier post on entropy for more on this.

This seems to gel well with the creationist claims, and is the source of all their nonsense about 'no new information in DNA'. Of course, there are several major failings in this treatment.

The first comes from Shannon himself, from the book that he wrote with Warren Weaver on the topic:

The semantic aspects of communication are irrelevant to the engineering aspects.
The word information, in this theory, is used in a special sense that must not be confused with its ordinary usage. In particular, information must not be confused with meaning. In fact, two messages, one of which is heavily loaded with meaning and the other of which is pure nonsense, can be exactly equivalent, from the present viewpoint, as regards information.
So we see that Shannon himself doesn't actually agree with this treatment of information relied on so heavily by the creationists.

The second is that Shannon's is not the only rigorous formulation of information theory. The other comes from Andrey Kolmogorov, whose theory deals with information storage. The information content in Kolmogorov theory is a feature of complexity or, better still, can be defined as the amount of compression that can be applied to it. This latter can be formulated in terms of the shortest algorithm that can be written to represent the information.

Returning to our TV channel, we see a certain incongruence between the two formulations, because in Kolmogorov theory, the noise that you encounter when the TV is slightly off-station actually represents an increase in information, where in Shannon theory, it represents a decrease! How is this so? Well, it can be quite easily summed up, and the summation highlights the distinction between the two theories, both of which are perfectly robust and valid.

Let's take an example of a message, say a string of 100 1s. In its basic form, that would look like this:



Now, there are many ways we could compress this. The first has already been given above, namely 'a string of 100 1s'.

Now, if we make a change in that string,



We now have a string of 9 1s followed by a zero, repeated 9 times. We now clearly have an increase in information content, even though the number of digits is exactly the same. However, there is a periodicity to it, so a simple compression algorithm can still be applied.

Let's try a different one:



Now, clearly, we have something that approaches an entirely random pattern. The more random a pattern is, the longer the algorithm required to describe it, and the higher the information content.

Returning once again to our TV station, the further you get away from the station, the more random the pattern becomes, and the longer the algorithm required to reproduce it, until you reach a point in which the shortest representation of the signal is the thing itself. In other words, no compression can be applied.

This is actually how compression works when you compress images for storage in your computer using the algorithms that pertain to JPEG, etc. The bitmap is the uncompressed file, while the JPEG compression algorithm, roughly, stores it as '100 pixels of x shade of blue followed by 300 pixels of black', etc. Thus, the more complicated an image is in terms of periodicity and pattern, the less it can be compressed and the larger the output file will be.

So, which of the following can reasonably be called 'information'?

From sand dunes, we can learn about prevailing wind directions over time and, in many cases, the underlying terrain just from the shape and direction.


Theropod wrote: The dogshit can tell us what the dog ate, how much of it ate, how big the dogs anus is, how long ago the dog shat on your lawn, the digestive health of the dog, whether there are parasite eggs in the shit and contain traces of the dog's DNA we can sequence to identify the individual dog. Seems like a lot of information to me. It also seems like more than enough information is present to shoot your assertion down.


DNA is information in the sense that it informs us about the system, not that it contains a message. It is not a code, more something akin to a cipher, in which the chemical bases are treated as the letters of the language. There is nobody trying to tell us anything here, and yet we can be informed by it. 


About 1% of the interference pattern on an off-channel television screen is caused by the cosmic microwave background.


This is information in the sense that the squiggles represent more data than would be contained on a blank piece of paper, although even a blank piece of paper is information. In this example, information is defined as the number of bits it would take to represent it in a storage system. As it's fairly random, this is less compression-apt. This is pure Kolmogorov information.


Of all the information sources in this post, this is the only one that actually contains a message, and is therefore the only one to which Shannon information theory can be applied, as it is the only one that could actually decrease in terms of signal integrity.

Which of the above are information?

Answer: All of them. They are just different kinds of information. 

So, when you get asked 'where did the information in DNA come from?' the first thing you should be doing is asking for a definition of 'information'.

What about the genetic code? Scientists talk all the time about the genetic code, and creationists actually quote Richard Dawkins saying that it's digital, like a computer code. How to address this?

Let's start with the claim by Dawkins. What does it actually mean for something to be digital? In the simplest terms, it means that something exists in a well-defined range of states. In computers, we tend to talk about binary states, in which a 'bit' comprises either a 1 or a 0. This is the well-defined range of states for binary. Other digital states are more complex, composed of a greater number of states, but the same principle applies to all of them. I'll link at the bottom to a post by Calilasseia dealing with not only a range of computer codes, but some interesting treatment of what it takes to extract information from them.

Moving on to the 'genetic code', in DNA, we have the nucleobases Cytosine, Adenine, Guanine and Thymine (In RNA, thymine is replaced by uracil (U)). These are the digital states of DNA. We use only the initial letters in our treatment, CAGT. Further, they come in pairs, with C always pairing with G, and A always pairing with T (or with U in RNA).

From here, we can build up lots of 'words', in that when they pair in certain ordered sequences (no teleology here), they produce specific proteins, that go into building organisms (loosely). The point is that this is all just chemistry, while the code itself is our treatment of it. In other words, the map is not the terrain.

DNA is a code in precisely the same way that London is a map.

More here by the Blue Flutterby.

Onus Probandi, Assertionism and Peer-review.

Good morning, ladles and jellyspoons.

In the wake of the hyperbolic, hysterical travesty that was the Boaty McBoatface saga, I have, after many years of resistance, succumbed to the double-edged Damoclean temptation of a sojourn into the Twitterverse. I've avoided it precisely because I have the worst case of SIWOTI syndrome in the recorded history of the universe since the invention of Tim Berners-Lee, especially when the WOTI is, in my esteem, harmful.

It should come as no surprise to learn that I've managed to find some WOTI in fairly short order, in this case in the form of a column on the website of the Daily Telegraph and a resulting series of tweets and other interactions.

I had wanted to use this post to continue on pre-Planck cosmology, but thought it worth picking this apart a little.

The article itself is this one, headlined Forget my son - I'm the one in exam stress hell, penned by Allison Pearson, known to some as the author of I Don't Know How She Does It.

Much of the article is unobjectionable. Liberally sprinkled with apposite anecdotes (see below for caveat regarding this), and a tone of exasperated but good-natured humour, with some reasonably decent advice for parents enduring the slings and arrows of teens going through one of the most stressful times in their young lives. Then, however, comes the following:

When they made the exams easier to pass they also made perfection attainable. Diligence is all. Fine if you’re a swotty, well-organised girl, not so good for testosterone-plagued slugabeds.

Now, the diligent reader will spot the issues with this without direction. Setting aside the notion of the attainability of perfection, which has some logical problems all its own, the real issue lies in the latter portion. I'll come back to that shortly, but here's the opening salvo:

A couple of assertions in there, not least the one saying 'all the evidence suggests...'

There's an extremely common fallacy committed in public discourse known in rigorous circles as onus probandi, closely related to the fallacy of bare assertion, both of which are discussed in a previous post. Onus probandi is the shifting of the burden of proof. In essence, this fallacy is committed when, rather than supporting one's own claim, the claimant insists that the skeptic prove it false.

It's up to the person erecting a claim to support that claim. Failure to do so commits one of the two above fallacies, dependent on the circumstances. The reason for this should be fairly obvious, namely that skepticism is the rebuttable position for any given claim.

In the first assertion above by Ms Pearson, no good reason has been given to accept the claim 'that boys lag behind girls in organisation skills and exam results'. This commits the fallacy of bare assertion. It's easy to see how this is related to the onus probandi, because the fallacy of bare assertion is essentially onus probandi by proxy.

Dr Weinstein then raises the motivating point for her initial interjection, and it's a good one. She asks Ms Pearson whether she thought the veracity of this claim (not contested for the moment) was sufficient reason to promulgate stereotyping in a national newspaper.

Stereotyping is another common fallacy (hasty/sweeping generalisation), but this one actually has measurable negative consequences in the real world, and the more people who buy into a given stereotype, the more destructive it is.

In this instance, she's voicing a commonly-held perception that girls are generally better organised and less indolent than boys, and that this is reflected in their exam results. Whether this assertion is actually true is almost beside the point for the purposes of Dr Weinstein's point, which is that perpetuating such a stereotype can be harmful and, worse, stereotypes are self-perpetuating[1] (this is how it's done, incidentally: Ed).

Ms Pearson defends her article, saying that 'it isn't stereotyping', which reveals another problem with stereotyping, which is that we often don't even realise we're doing it!

Of course, the scientifically literate among us realise that this is just another manifestation of our exceptional pattern-seeking software, which has had deep consequences to our evolutionary history, not least that it's lasted as long as it has, but I digress.

She goes on to assert 'it's science'. Here's the fallacy again, but this time with a smuggled in argumentum ad verecundiam (appeal to authority).

Moving on to the follow-up:

This is a quite beautiful fallacy committed by Ms Pearson, the argumentum ad populum in all its glory, along with a glorious feedback loop of 'reasoning' whose radius is related to its circumference by a multiple of 3.14159 (excuse the tautology, it was for emphasis).

It's true because mothers loved it, and mothers loved it because it's true. Can you say 'circular'? I know you can!


Of course, at this point the inner gobshite took over, and I interjected:


SIWOTI! That's my excuse and I'm sticking to it.


Anyhoo, there was another exchange, this time on another platform:

 And this is what motivated me to write this post. Ineedmoresleep needs, I think, a little more thought. Perhaps too much sleep (or too recent; we can all be groggy) has undermined this person's ability to reason temporarily but, given that this is such a common fallacy, quite probably not, which is a worry. First denying that there's a stereotype, then defending that stereotype as accurate! Which is it? 

Now, it's possible that this is a semantic issue and, always willing to apply Hanlon's Razor, I must entertain the notion. Stereotyping is having a set idea about what something or someone is like. For fans of the argumentum ad lexicum, most dictionaries will add a clause to the effect that this is especially with ideas that are wrong, but this isn't actually necessary for stereotyping to occur. It only requires that it be generalised.

Some stereotyping is useful. We all do it. The fact that I stereotype cars as dangerous means that I look both ways before I cross the road. Stereotyping certain patterns as dangerous in aeons past allowed our ancestors to recognise the pattern of a tiger in the grass closely enough to be able to escape it, thereby avoiding a terminal destiny as an entrée (this is an example of a stereotype that will often be wrong, yet still confers a survival advantage).

That said, reaching a conclusion about something based on sweeping assumptions is always deductively fallacious. In an earlier post, I dealt with how different types of reasoning are employed in science. See that for more on this. Stereotyping of the sort that gives you a head start on the tiger is abductive and, in that sense, it's fine (you're far less likely to come a cropper if you respond to the warning sign than if you don't). However, when you use such sweeping premises in deductive arguments, you're doing it wrong.

More important, and the source of Dr Weinstein's chagrin, is that young people going through their exams have a tough enough time of it, and really don't need this additional pressure (I wonder how Ms Pearson's son feels about being described as 'indolent' or a 'testosterone-plagued slugabed', not to mention the fact that she's comparing her suffering to his in all this).

Before moving on, I just want to add  something from a head teacher on this:


Let's briefly look again at the interaction with Dr Weinstein:

This is the worst form of onus probandi. This doesn't merely shift the burden of proof by suggesting that the skeptic prove the arguer wrong, it actually asks the skeptic to demonstrate the veracity of the arguer's argument.

Even Sagan, in his brilliant exposé on the failure of critical thinking in society The Demon-Haunted World: Science as a Candle in the Dark didn't foresee this. I have an invisible dragon in my garage, now prove I'm right...

One thing I didn't cover in the previous post was what follows from the eight steps of logic of the scientific method: Peer-review.

When you've worked really hard and covered every bit of ground you can think of in trying to break your hypothesis, and you're satisfied that you've thought of every possible means to falsify it, it's time to publish. For the scientist, this is a fraught time. 

You go over your findings, make sure you've got every equation in place, all nicely LaTeXed up. You make sure all your graphs and charts are an accurate representation of the data, and that the error bars are included. You go over every bit of mathematics and make sure that the logic is flawless, and that all your assumptions are properly stated, with supporting citations from the primary literature so that those assumptions can be cross-referenced for rigour. You make sure your abstract accurately describes what the problem is that you're attempting to solve, and give a brief précis of how you hope to solve it.

Then, when you've got all your little ducks lined up, you submit to a journal.

You can think of this as like a driving test. When you've passed your driving test, you have a token that says 'I have mastered the basics of controlling this machine, and now I am allowed to go and learn to drive'. That's what your paper is now doing. 

In this analogy, the author of the paper is the driving instructor, and the paper itself is the trainee driver.  When the instructor thinks his pupil is ready, he refers her to a test centre (science journal) for a driving test (publication). The test centre will now assign an examiner (an anonymous expert or group of experts in your field - the referee), and the examiner will then test your driving ability to destruction. It's in the driving examiner's best interest to make sure he's harsh, because he doesn't want to let unsafe drivers on the road, so he's actively looking for reasons to fail you. 

Similarly, a referee will test a paper to destruction, and it's in his best interest to work as hard as he can to demonstrate that the conclusion drawn is not supported (see falsification and falsifiability in my earlier post). Also, the journal doesn't want to publish papers that don't stand up to critical scrutiny, because that's damaging to the reputation of the journal, and affects its impact factor (a measure of its influence on science at large, loosely speaking). 

If the referee can find no problems with the paper, and has failed to falsify it, he will pass the paper and it will be published in the journal. Now, the paper has passed its driving test and is ready to go out there and face the rigours of traffic in the real world. Some papers will get honked at by other drivers for cutting them up, others will shake fists at it, or advise them where the accelerator pedal can be found, and the paper makes its way. 

OK, so I'm running a bit too far with this analogy, but the point is, I hope, well taken.

The point is that the scientist must work really hard to get her assertions through this process, and it doesn't stop at publication because, once published, it's fair game for anybody who wants to challenge it. Sometimes, those challenges will comprehensively falsify the paper, such as happened with the BICEP-2 result, because the reviewers spotted a flaw in the reasoning, namely that it hadn't ruled out a known source of the readings they were getting prior to attributing them to the mechanism behind the underlying hypothesis. Note that the hypothesis itself wasn't falsified. More on that here.  

So, this is what scientists are used to seeing with claims. It's a good process; not flawless, but it works (in fact, in the long run, it pretty much IS flawless, because shoddy or incorrect papers are always weeded out in the long run, because science is self-correcting). 

The important point is that, given any assertion or hypothesis, it's always up to the one evincing said hypothesis to support their claims and to provide the data upon which any conclusions are based. Now, I'm sure that no scientist I know expects such a level of rigour from everybody, non-scientists included, but there are some basic standards of discourse that absolutely should be met by everybody, and among them is the principle that countering the rebuttable position with supporting data is the responsibility of the claimant. These standards are not being met in the above exchanges, and this is all-too prevalent in the social media sphere and the world at large.

So, if you're going to get into it with a scientist (or with anybody else, for that matter), be prepared to provide the data upon which your assertions are based. Failing to do so is every bit as effective in getting your point across as saying nothing at all, except that now you've pissed somebody off, to boot..

That which can be asserted without evidence can be dismissed without evidence - Christopher Hitchens.

And so it should be. 


I hear ya, sister! 

[1] On the Perpetuating Nature of Social Stereotypes - Snyder 1981 (cited from Hamilton 2015)