Abracadabra

There is a monkey hitting a typewriter at random. He is only hitting letters and that the chance for each letter to get hit is equal. Do you have to wait on average longer, the same or shorter before the monkey will have typed 'abracadabra' or 'abcdefghijk' (i.e. a string with an equal amount of characters)? Let's find out.

A long time ago a good friend of mine told me this puzzle. Let's say there is a monkey hitting a typewriter at random. And let's assume he is only hitting letters and that the chance for each letter to get hit is equal. Do you have to wait on average longer, the same or shorter before the monkey will have typed 'abracadabra' or 'abcdefghijk' (i.e. a string with an equal amount of characters)? Don't start reading the next paragraph if you want to solve the puzzle, since I am going to give away the answer.

Most people tend to say, well, it must be 'abracadabra' that on average shows up first. Abracadabra's can overlap, so it is just more likely that you have that sequence and therefore you have to wait less long on average before it appears. Sounds reasonable enough, but is of course wrong.

A smaller group of people tends to say that it must be equally long. If you randomly select a character in the stream of characters coming from the monkey and ask yourself, how likely is it that at this point 'abracadabra' or 'abcdefghijk' starts, you'll see that these chances must be equal, that is, once the monkey starts typing the first letter of the sequence, he has to type exactly one letter as the next and the chance of him doing that is 1 in 26 and this is independent on what he needs to type. Again, sounds reasonable but is also wrong (or more to the point, this is correct, but that was not the question).

That pretty much leaves only the third option, i.e. abracadabra takes longer. I have two explanations here. The first one combines the previous two answers. Yes, abracadabra's can overlap and yes the chance for abracadabra to start at a certain point, is the same as for a-k.  So if we have a lengthy string of characters typed by our monkey, we expect that the number of occurrences for abracadabra to be the same as for a-k. But since abracadabra's can overlap, they take up less space so to say (i.e. a larger portion of this string is made up of characters that are not part of abracadabra then not part of a-k). Therefore the average distance between groups of characters that are part of abracadabra must be larger than the average distance between groups of characters being part of a-k. Since when we are starting to type we are not in the middle of a group of abracadabra characters, it will take longer before we encounter some group first.

Not everybody is convinced by this explanation. Another way of looking at it, is to look what happens when the monkey types and only needs the last character of any of the targets. In the case where he is trying to type abracadabra, there are two options, either he types the a, or he doesn't. If he doesn't, the monkey will have start all over again. In the case of a-k, the monkey as three options: he types a 'k' and we're done, he types b-j and we have to start over, or he types an 'a' and the monkey has blown this chance of typing a-k, but we already have the first letter of the next try, something that doesn't work with abracadabra.

If you're not convinced still, run the program below. I am not using abracadabra, but ab vs aa. And my monkey doesn't type all letters, but only a,b,c,d,e and f. Still, this shouldn't make a difference. Except for one thing: on average you have to wait 34 characters before you see 'ab' and around 40 before you see 'aa' . However, if you let the monkey type and you count how often 'ab' occurs before 'aa', you'll see that this is equal. Confused? Well, the first chance of either 'ab' or 'aa' is just after the first 'a'; if the monkey types an 'a' or a 'b' we're done. However, if the monkey types an 'a', 'aa' wins, but 'ab' has good chance of being just one after. If 'ab' wins, 'aa' has no such chance.

downloadable files:

abracadabra.py Source of the program to proof the outcome.