Tuesday, 16 September 2008

Debunking Corner - Information Theory

OK folks this one is little more challenging or more interesting, depending on how you like getting your head into new thoughts and ideas.  This piece is quite long but well worth the effort.  Whilst this is about maths there are no calculations and it is very well described for the lay reader.

I think this piece also goes to show why debating with creationists is a daft idea, besides giving them a platform and the appearance of two equal sides debating that is. This piece shows that creationists can quickly and easily generate scientific sounding bunk, their claims in this case take 2 minutes to state, but to explain why they are rubbish takes a good 15 minutes.

Generating pseudoscientific claims is much easier than carefully and accurately pointing out all the errors in them.

Anyway here is a classic creationist claim "that it is impossible for evolution to create new information".  I have heard Professor Andy Mcintosh of Leeds University say as much himself.  I mean, we all know what information is don't we?  It's just common sense.

Well the fact of the matter is that the kind of information evolution creates is in the form of the DNA of the particular creature in question.  New information is easily generated by "mistakes" in the DNA coding resulting in changes in the information, deletions of some information or duplications of some of the information.

The creationists however tend to ignore these simple rebuttals to their claims and have shown a propensity to quote mathematicians i.e. Information Theorists to support their claims.

Here are some excerpts of a recent posting by someone who knows a bit about maths.  The full posting is here.  This piece covers a lot of useful background and points out several more false claims about the information content of DNA.  The latest version of creationism is Intelligent Design and it makes a great play with Information Theory about being able to detect complexity in a non subjective way and goes on to claim that this must mean there was a designer, apart from this being an illogical inference the actual claim that such complexity can be detected is in fact not valid in the first place and Mark Chu-Carroll sheds some light on this for us.
Why is randomness informative? Correcting the Clueless Creationists

by Mark C. Chu-Carroll

Intuitively, we think of information in terms of semantics and communication. We think that information is related to semantic content. And our understanding of semantic content comes from language. So we expect things that are rich in information to be highly structured, like language.

But our initial intuition is wrong. wrong. As I explained when I was writing about information theory, a totally unstructured random string of bits has a lot more information than an equal-length string of english text. What we see as structure is really redundancy - multiple expressions of the same bits of information.

One of the things that I find particularly interesting about information theory is that it's an area where many of the basic ideas can be expressed in intuitive ways. There is a good intuition for why randomness has high information content - and once you hear it, it makes perfect sense, and you can easily see what's wrong with the argument that structure implies high information content. And the intuition is actually very nearly a direct rendering of the theoretical (Kolmogorov-Chaitin) definition of information!

Here goes:

Imagine you have a string of symbols. (In information theory, we describe everything as strings of symbols.) Now suppose you want to tell someone else how to write down the same string of symbols. The amount of information in the string is the length of the shortest description you can give them that allows them to write down the same string.

Think of a simple string like "1111111111111111". To tell someone how to write that string, you just say "Write 16 ones". For a slightly more interesting string, like "1010101010101010" - you could say "write '10' 8 times.". For "ABCDEFGHIJKLMNOP", you could say "write the first sixteen letters of the alphabet". For each of those, there's a nice simple description, because they've got a lot of structure. But that structure means that it doesn't take much to describe them; and if there's a short description, then that means that there's not a lot of information.

Now think of a random string: "XEGAPC]GI(BAUL+"? How could you tell someone to write down that string of symbols? There's some amount of pattern there - the letters are all uppercase; there are only 13 distinct symbols out of the sixteen. But you're not going to get away with a simple description. It's harder to give someone instructions on how to reproduce that exact string than it is to describe a nicely structured string - which is why we say that the random string has more information in it.

Think of a simple image: a triangle, drawn onto a 1000x1000 bitmap. To allow someone to produce an identical image, you'd say "A 1000x1000 bitmap, with a triangle with vertices (x1,y1), (x2,y2), (x3,y3)". The bit map has a million bits. The description above would be about 90 characters long with real vertex positions, which is 720 bits - using an extremely inefficient representation of the text. Because it has structure, it takes very little description to allow someone to reproduce it exactly.

Now, think of a random image. For every pixel in the image, you use a truly random process to determine whether it's black or white. So you've got something that looks like a screen capture of static on an old TV. How can you describe that image to someone else to allow them to produce an exact duplicate? You might be able to do it with less that a million bits - but you're not going to be able to do it with less that 1,000 bits. There's just too much information in the image!

The hardest thing to describe - the string that needs the longest description for its length - is something with absolutely no pattern. Something totally random, where there's absolutely no pattern, no logic, no way of predicting even a single bit without being told its value explicitly. A string with maximum information context is a string where it's so random that the complete string itself is its own shortest description. And the string with the most information is, by definition, the most complex. The reason why should be absolutely clear by now: it takes longer to describe something complex than something simple, and random stuff is the hardest to describe. So random stuff has the most information.

The word "random" means something very specific in the context of information theory. A perfectly random string is a string where knowledge of what came before in the string gives you absolutely no knowledge of what comes later. It's closely related to the probabilistic notion of random. The main difference is that from the viewpoint of probability, a string is "random" if it was generated by a random process - and a random process can (and in fact by necessity will) periodically generate a string that we wouldn't call random. For example, a truly random letter generator, given enough time, will eventually create a string consisting of the letters of the alphabet in order. But from the viewpoint of information theory, if we looked at that string, we'd say it's low information content, meaning non-random. On the other hand, if we take a highly compressed version of the bitmap of a triangle, it looks random - it's got very high information content for its size per information theory, but according to probability theory, it isn't random, because it's generated by a deterministic process. The big difference here is that when information theory talks about randomness, it's talking about unpredictability for someone viewing a string - it's not talking about meaning or semantics. You can take any meaningful statement, and render it down to a high information content string via compression. High information content versus low information content isn't a property of the statement or entity being encoded; it's a property of the representation. An image of a triangle is a representation of some information in a very inefficient form; a string of random-looking bits is a representation of the same information in a more compact form. The image is low information content because it's got very little information for its size; the compressed bits are high information content because they've got a lot of information for their size. But it's the same information.

Now, let's look at what the creationist in question actually had to say, so that we can see where he went wrong.

I'm also going to take things a bit out of order, because there's a stupid thing that I want to get out of the way before I get to the interesting part.
FWIW, I disagree with T-Stone's version of information and complexity. And despite what his post would lead you to believe, the idea that "maximal randomness = maximal complexity" is not true for all information theories. And in fact, if I were to use T-Stone's definition of complexity then I would ask him to explain not why there is so much complexity in the universe, but rather why there is so little complexity. If complexity = randomness, then it doesn't take a rocket scientist to realize that there's a lot of the universe that is not random, and therefore there is a lot of this universe that is not complex. Under his information theory, randomness is the default. We do not need to explain random data. We do need to explain structured and ordered data. Therefore, we do not need to explain complexity; we need to explain non-complexity.

This isn't something where there's room for disagreement. This is sort of like arguing that "I don't agree that 2+2=4." Informational complexity is well-defined by information, and it's got a precise meaning. The precise definitions vary between algorithmic information theory (Kolmogorov-Chaitin) and communication information theory (Shannon), but the basic concept underlying both is the same, and they agree that complexity is related to information content, and maximum information content (and thus maximum complexity) is perfect randomness.

There is no information theory that says randomness doesn't maximize information content and complexity. None! 

This is something that you see frequently from the clueless about information theory: they really don't like the idea that randomness contains maximum information, and they assert that not all information theory agrees with that - like the statement above "maximal randomness = maximal complexity is not true for all information theories"; but they never actually cite any kind of information theory at all - because there is none that does what they want. They're sure that there must be, because K/C and Shannon seem wrong. But there is no such theory, no matter how much you may want one to exist.

Now, we'll jump backwards to the interesting part.
Unfortunately for T-Stone, if he paid attention to what he has written here he'd see that he's soundly refuted Dawkins. After all, if maximal randomness is equivalent to maximal complexity, then it is easy for me to write a program that will generate completely random output. In other words, it is easy for me--a person who is not maximally complex--to produce a program with output that is maximally complex. Thus, if we want to play T-Stone's game and use complexity in this sense, then Dawkin's argument must be surrendered.

If I can make a program that is more complex than I am, then God can create a universe that is more complex than He is.

Our creationist friend is making several mistakes here. One of them is his claim that he can easily write a program that generates a random string. Generating random data - true random data, not just data that appears random - is really difficult. What he's thinking about, I'm almost sure, is running a program that calls a function like the C "srandom" function. That's not random; it's completely deterministic. Start with a particular seed, and you'll always get the same sequence of "random" numbers. In fact, there's only one sequence of values that you'll get: the function is a long-cycle periodic function. Every sequence generated using that function is a part of the same, long, fixed cycle. And you can perfectly reproduce any string that you generated in it with a simple set of instructions. The original pseudo-random function used by "srandom", in the Berkeley Unix C library is one line long; it's basically the next "random" number is "(lastrandom*1103515245L + 12345L) & 0x7fffffff". So given that line of code, and the initial seed, you can generate the "random" sequence. So our friend's "random" sequence is anything but random. It's got a nicely random distribution, but it's not random - it's reproducible with a very short sequence of instructions. (Even if you use a better pseudo-random function, it goes up to around 64 lines of code for a really nice one.) Computers are deterministic: they suck at randomness. In fact true randomness must come from a data source outside of the computer. Real randomness comes from random external events, like radioactive decay. Building a true random generator is a very non-trivial task.

That's thoroughly clueless. But it's not the worst of his errors. He's confused about two really important things about how you do things in information theory. First, he's confused about what we mean in information theory by having a program generate a string; and second, he's confused about how to compare the complexity of two different strings.

In information theory, when we talk about a program or a set of instructions for generating a string, we're talking about a program that repeatably anddeterministically generates that string. What makes a high complexity string is the difficulty of reproducing it, not of producing it. If I've got a true randomness source, I can generate a random string from it with very little difficulty. But that's not the important thing, from the viewpoint of information theory. What information theory cares about is how can I describe the string in order to allow it to be reproduced. If he's written a short program to generate a string that has high information content, that's not very impressive, and it says nothing about information theory. The short program that generates the random string doesn't mean that the random string is easy to reproduce, or even that the string is really produceable by a short program. Because when information theory says a string is produceable by a short program, what it means is that it's produceable by a short deterministic program. Run a random program twice, it will generate two different strings! So from the viewpoint of information theory, the string isn't a product of the program; it's the product of randomness which was the input to the program, and that says nothing about its information content. What matters is how long a program that will always generate the random string, and only the random string.

Completely random strings are incredibly rare in the real world. Almost any finite string that you can imagine is at least somewhat compressible - which means that it's got some amount of redundant structure. If you think of it by metaphor with numbers, theoretically, the vast majority of numbers are irrationals. But in the real physical world, encountering irrational numbers isn't particularly common. Most things that we really encounter have finite precision. We don't see perfect circles that really have a perfect ratio of π between their measured circumference and diameter. We don't see triangles so perfect that their edges have a 1/1/sqrt(2) ratio. We see finite approximations. Similarly, we don't often see true randomness.

Our friend doesn't get that. And that's a huge problem for his argument.

The second mistake in in his idea of comparison. Even if I could write a program that generates terabytes of randomness, it won't be as complex as a human being. Because for all of the structure that there is in us, for all of the redundancy, for all the duplication, we are enormously complex. To create a string of information that provided you with enough information to be able to accurately reproduce the precise state of my body at a moment in time is really mind-boggling. It's an amount of information that's difficult to conceive. The error that our friend is making is forgetting that complexity isn't a percentage. A string that is 80% random (meaning 20% compressible) can be less complex than a string that is 10% random - provided the second one is longer. If the 80% random string is 1,000 characters long, and the 10% random string is 1,000,000 characters long, then the 10% random string is more complex!

Of course, that's not the end of his cluelessness. Following directly on that error, he finishes up with:
T-Stone is just giving a sleight of hand here. It would be like a mathematician saying "a > b" and having T-Stone say, "The greater than sign is inverted with the less than sign, therefore 'a > b' means 'a is less than b'."

But as soon as he engages in his sleight of hand, we respond: "If the greater than sign is inverted with the less than sign, then 'a > b' is no longer true, rather 'a < b' is."

Inverting the operator without inverting the operands does not refute the original expression.

What he's going on about here is based on his misunderstanding of what complexity means in terms of information. He wants structured strings to have more information than random. He's insisting that that's the way things really work. And so, by his reasoning, an argument that randomness has more information than structure is reversing the correct comparison - more structure should be more complexity, so if you're making an argument that something needs to have more complexity, it necessarily must be an argument that it has more structure. If you insist that randomness is complex, then you're reversing the argument: every statement that you make arguing for complexity needs to be switched to an argument for simplicity, because you've swapped things around.

Unfortunately for him, the fact that he's completely clueless changes nothing about the nature of reality. Randomness is more complex than structure. And when we talk about how complex the universe is, we're using the same definition of complexity. It doesn't work the way he thinks it does - he's just plain wrong.

No comments:

Post a Comment