A Bilingual Monkey with a Typewriter
Can monkeys with typewriters teach us about computers, cryptology, and artificial intelligence? Yes, yes they can.
By Roman Kudryashov
1. Fixing the Problem of Monkeys
It was all very simple. Give an infinite number of monkeys an infinite number of typewriters, set them to work for an infinite amount of time, and they’re bound to replicate the complete works of Shakespeare. Eventually. Pure dumb luck, but given the time, it could happen. Not likely by any stretch, but statistically possible. So I figured: could we make it more likely?
I propose that it is more statistically likely for monkeys to randomly write Shakespeare in binary code than in regular English.
To remind everyone, binary code is the most basic language of computers, the 100110s that make decoding the green rain in the Matrix seem simple. Generally speaking, a single letter in binary code takes 8 (!) digits to write: “a” would be 01100001; “A” would be 01000001.
The complete works of Shakespeare, as found on Project Gutenberg, are 5,137,094 characters, counting spaces and punctuation. But in binary, this would be approximately 41 million characters, eight times longer than using the English alphabet. How exactly does that make it more likely than before?
Well, there are two reasons. The first follows distribution spikes (my fancy way of pointing out the difference between a typewriter, even one specially designed for trained monkeys to write Shakespeare without getting bored) and a binary keyboard. A typewriter will have letters distributed over a large space, and monkeys bashing those keys would be more likely to hit some letters than others. That would make it easy to type words without uncommon letters (like “X” or “Z”), but much less likely to spell words with these uncommon letters correctly. This means that every time a monkey was on a lucky streak and came across a word with an uncommon letter, they’d be thrown off. The probability, then, of the monkeys writing the collected works is at least as low as the probability of them typing the least common word (while on a lucky streak, of course). A binary keyboard, by virtue of having only two keys, the “1″ and the “0,” eliminates that. They are just as likely to type out any letter or punctuation mark. Meanwhile, the probability of misspelling a word increases only marginally, because it doesn’t matter whether you write a “z” or an “e,” when the letter you were looking for was a “T.”
The second reason is more interesting: to write Mr. Shakes’ complete works in binary is to partake in an act of translation, just as if you were to write it in French or Serbian. Translation is, technically speaking, the application of a code, where if you know the key, you can get your original information
Sidenote: Translation is technically not the same as encryption. Translation is changing one cleartext to another cleartext, data in a form readable by one process into a different form readable by a different process; encryption is deliberately changing the cleartext to cyphertext, now illegible without a key. Translation of text from A to B is not by nature an exclusionary tactic; by design it’s to make a text more accessible.
Funny thing about language, though, is that all languages are codes for information, filters and interpreters and symbols for what you actually see, experience, understand, communicate back.
In Hamlet, the title character’s name is repeated over and over—why not, instead of writing it in 48 ones and zeros for each letter in binary, do we not simply code it as one sequence of eight characters? By condensing it, we cut 40 digits from our binary total, and assuming we have the right code to decipher the “Hamlet” sequence (just as we need to understand English to “decipher” it), we are able to condense repeated words. Similarly, we can do the same for combinations of letters, stage directions, and anything else that shows up more than once.
This ‘simplification’ and ‘compression’ can be understood through Andrey Kolmogorov’s work on Complexity: he proposed that the simpler an object is, the less information it has; the more complex it is, the more information it has. He then put all of this on mathematical footing: the amount of ‘information’ in any given object can be measured in the size (in bits) of the smallest algorithm needed to recreate it. Thus, an object produced by a short algorithm isn’t very complex, while an object that needs an algorithm as long as itself is maximally complex. Along these lines, randomness (the absence of order and regularity), information, and complexity all become different ways of saying the same thing; hence pattern recognition.
With that in mind, consider the following: a .txt file of Shakespeare’s complete works is about 5.2mb—that’s an interesting number, 5.2 million bits of information, one bit approximately corresponding to each letter, punctuation mark, or space. Compressed (as a .zip file), we are able to cut that down to 1.2mb. Combined with the physical act of banging on a 36-key board as compared to a 2 key binary keyboard, using binary becomes a more efficient way of having monkeys recreate the complete works. Or, to go back to the original proposition, a slightly more likely way for monkeys to deliver the same information to us, albeit in a differently coded (and less readable) way.
2. Creating a New Problem with (Smarter) Monkeys
Another Note: The thing about binary is that it’s more abstract, it lacks the easy differentiation that having a huge vocabulary lends to communication. It’s the difference between “1001101″ on one end of the spectrum of explaining yourself, “happy” in the middle, and “ecstatic,” “jubilant,” or “excited” on the other. It also doesn’t help that it’s almost impossible to ‘read’ binary… in terms of difficulty deciphering it, it’s probably on par with late-era cuneiform, an extinct language with only one ‘key’ found to deciphering it (the key being the ‘Behistun inscriptions’ in Iran, a more-complicated Rosetta Stone-type series of inscriptions).
During World War Two, Alan Turing, the founder of computer science, created a number of machines collectively called the “Bombe” to decode Enigma, the machine that Germans used to encrypt and decrypt messages. It worked on brute force, which is to say, a lot like monkeys on typewriters randomly trying out lots of different combinations until one of them made sense. What made The Bomb so efficient was that it could account for linguistic frequencies in the text. Turing was able to assume that most messages coming into Germany would have some sort of common initial formalities. They mostly began with “Mein Furher,” or something similar.
Alongside that, common German prepositions would keep repeating themselves: every message would have the English equivalent of “the” or some sort of descriptive word. Turing and his team of machines and humans would figure out a frequent three- or four-letter combination and assume that is stood for one common word that they knew. Based on that strategy, now remembered and still used as the ‘known-plaintext attack’, The Bombe would take over, and within 24 to 72 hours would be able to crack a message, at least until the code changed again (Enigma had its codes changed very often, sometimes every day). We’ll call that, somewhat inaccurately, feedback.
Turing’s Bombe worked a lot like the monkeys, except in one regard. For the most part, it was brute force: Does this work? Does this work? Does this work?—ad nausea. Brute force means just that, trying every possible combination until you get the right one. Every time the Bomb found a combination that worked, that combination was logged; the Bomb now ‘knows’ something; every time it encounters a coded combination, it automatically substitutes the right one. The remaining “code,” gets narrowed down until the entire thing is solved. Voila!
Suppose all those monkeys were all sitting here, all infinite number of them. Suppose every time they typed a “real” word, they were rewarded and they remembered the word. A monkey types “Hmlat,” then “Alwert,” then “Hamlte,” and on lucky keystroke combination “Hamlet,” they were given a treat and remembered the word. That’s our feedback: they build up a repertoire of “words” or combinations of letters. These become educated monkeys, for our purpose. More and more of this sort of feedback and your brute-force-monkey-machine will eventually recreate Merriam Webster’s dictionary. Get Noam Chomsky involved and you can start giving them feedback on sentence structure: “Blue man dog murder” does not get rewarded, but “Murdered man turns blue” does. All of a sudden, you can turn that same brute force combination of letters into combinations of words and sentences. Pretty soon, those educated monkeys are typing in sentence after sentence—they’re been given a syntax, in a sense. Given even more complex feedback, you can create an ecosystem of sentences as responses to other sentences. In a complex enough environment and feedback system, you can get them to respond “I am well” or “Doing poorly” in response to “How are you?”
At that point you have, for all intents and purposes, taught that brute force machine (with feedback) to speak or write intelligibly. You’re able to have a reasonably simple conversation with a system of rules. That, my friends, is a very basic artificial intelligence: a set of algorithms in response to stimuli. So as the quote goes, “the question of whether a machine can think is no more interesting than the question of whether a submarine can swim.”
Moreover, in September 2011, programmer Jesse Anderson did a basic version of this sort of experiment: he created millions of tiny programs (his monkeys) and set them to randomly generate nine-digit strings of information. Every time they created something that matched a nine-digit string in Shakespeare, it was logged & used to fill (in no particular order) a blank edition of Shakespeare. It didn’t take long for the digital monkeys to finish, and the data curves from the experiment showed that the more monkeys ‘knew’, the more they got ‘right’. Of course, his experiment was a bit of a cheat, but it shows that the principle is right. (Meanwhile, actual monkeys given typewriters “produced five pages of text, mainly composed of the letter S, but failed to type anything close to a word of English, broke the computer and used the keyboard as a lavatory,” the Telegraph reported.)
3. You can lead a horse to water, but…
Google has an interesting project called Google Books. You’re probably encountered it looking for something to read online, or heard about it as a legal scholar. Google scans a number of current, old, out-of-print, or obscure books, then offers them online, for free, to you. Sometimes only a handful of pages, sometimes most of the book. Google’s pretty awesome for that, right?
Well, more than just trying to be awesome, Google is scanning books for its databases. A scanned book is thousands of sentences, and so Google is able to create a huge library of semantics. That’s why you can “ask” Google a question and it can “answer” it. Google’s giant laboratories are quietly humming along, figuring out which words go together, what follows what, what sentences merit what response, building a whole system of sytax and semantics and conversation. The end result is like in the scene from Terminator when you see inside Arnold’s head as he hears a sentence and then has to choose from a number of responses: Hasta la vista, baby. And this isn’t limited to just reading books: Google’s done this with adwords, their autocomplete, their attempts at a semantic web search, and so forth. You can read about it in Siva Vaidhyanathan’s book, The Googlization of Everything, or online in bits and snippets at Artificial Brains and The Edge.
The Google Books project is the summation of everything I’ve described here: the brute force machine (our infinite amount of monkeys typing away) gradually accumulating lots of rules to be able to string sentences together. In reality, it’s all much more complicated than that you create networks of words, have semantic webs, weigh words, etc. But the essence and the moral of the story is: given the right feedback, and a large enough amount of information, your brute force machine—ahem, monkeys—can turn into a relatively complex artificial intelligence. In a way, that’s how we think—stringing together lots of words to create some sort of meaning that someone else can understand, coded in one language or another. As we get older, we get better at it, and it becomes so easy we don’t even stop and think about it.
The only thing that’s missing is the creativity–you can program a machine to respond, but you can’t really teach it to come up with “new” things. As the old saying goes, “You can get an infinite amount of monkeys to retype the complete works of Shakespeare, but you can’t get them to be the next Shakespeare.”
But, at the end of the day, maybe “creativity” is irrelevant. After all, aren’t all “new ideas_ at first taken for unintelligible, incomprehensible? It’s only after a certain period of time that those ideas get assimilated and turned into understandable and acceptable ways of thinking. Take Galileo, for instance. Our feedback loops for him were broken: he said something that was right, but we said it was wrong and killed him. Oops. So maybe, these machines we’re making now can be intelligent. But we won’t know that until later, of course.
In the meantime, let those monkeys keep typing. They’re the analogy for our computers in a simple way, and who knows what they can accomplish…
“Whatever intelligence is, it can’t just be intelligence all the way down. It’s just dumb stuff at the bottom. Much of biology boils down to chemistry.” — Andy Clark
Latest posts by Roman Kudryashov (see all)
- No Love: Tennis, Politics, and Queens, NYC - March 19, 2013
- The Weekender: Feb. 2 - February 2, 2013
- Start the Press: Reading about the Publishing Industry - December 28, 2012
- When Hurricane Sandy Tried to Visit the Economy - November 4, 2012
- Think Outside of the Box - October 2, 2012