Cracking The Secrets Of A Lost Language (Or Not)

New technologies may be able to decipher the Voynich manuscript, a strange book that has baffled researchers for decades.
For many centuries, ancient Egypt was a mystery to historians. Archaeologists investigated the pyramids and cities, but there was a huge gap in understanding: the hieroglyphs. The Egyptians wrote with a writing system that, up to the 18th century, no one could figure out. Ancient Egyptian culture was nearly lost.

Then, the Rosetta Stone was found. In the late 18th century, as France’s Napoleon Bonaparte was having his soldiers raid Egypt for antiquities, someone stumbled upon a large black basalt slab. Its importance hinged on the fact that it was written in three languages: Egyptian hieroglyphics, Egyptian demotic (the less formal writing system in ancient Egypt) and, most importantly, Greek. While the Egyptian languages had been dead for two millennia, written Greek was a language that scholars could read. The Rosetta Stone also advertised the fact that it was the same text written three times in various languages. Using knowledge of the Greek, Egyptologist Jean-Francois Champollion was able to translate the texts for the first time.

Would historians have been able to decipher the hieroglyphs without the Rosetta Stone? While there have been huge advances in linguists’ code-breaking abilities, it’s possible that it would still remain a mystery without some translation help.

To understand exactly how challenging it is to crack a mystery language, we can look at one that still remains a riddle: the Voynich manuscript. While it’s pretty inconsequential — the language in the manuscript appears nowhere else — it’s become one of the greatest linguistic riddles of our day. Since its discovery, it has obsessed researchers, been called a hoax, been the subject of a huge number of articles (including this one!), and still, after years and years of being examined by brilliant minds, completely eluded deciphering. Let’s dive in.

What Exactly Is The Voynich Manuscript?

The Voynich manuscript is a 234-page tome filled with writing in a strange language now called Voynichese. It also features many weird illustrations of unknown plants, people and astrological charts. Based on the images alone, it seems to be some guide to astrological herbology, and most assume it has something to do with medicine. The full manuscript is available for viewing online, thanks to Yale University.

Where it comes from is, as to be expected, pretty hard to pin down. In any case, here’s a brief timeline of the manuscript’s history:

  • Early 15th Century: By carbon dating the manuscript, researchers are pretty confident that it was constructed in the early 1400s. Researchers also believe, based on markings on the manuscript, that it’s from Italy.
  • August 19, 1665: This is the date on a letter written by Johannes Marcus Marci, which was found in the manuscript in the 20th century. The letter claims that the manuscript was “sold to Holy Roman Emperor Rudolf II at a reported price of 600 ducats and that it was believed to be a work by Roger Bacon.” It was later proved that Roger Bacon, an English alchemist, was not the author of the text. Some time around 1665, the manuscript is added to a collection of papers belonging to Jesuit scholar Athanasius Kircher.
  • 1670 to 1912: Honestly, no one knows for sure.
  • 1912: The manuscript is purchased by Wilfrid Michael Voynich, whose name is forever attached to it. The circumstances of the purchase are unclear, but it seems it was part of a set of books sold by Jesuits from the Collegio Romano.
  • 1921: Harper’s Magazine publishes an article about the manuscript under the title “The Most Mysterious Manuscript In The World.” The manuscript became a phenomenon among cryptographers and medievalists.
  • 1930: Voynich dies, and the book is left to an heir, eventually being passed down to a woman named Anne Nill.
  • 1944: William F. Friedman, who is one of the world’s greatest cryptologists and at one point is head of the National Security Agency, starts a research group devoted to decrypting the text. Friedman likely devoted the most time to the manuscript of anyone, and by the time he died, he had decided it was an early attempt at a constructed international auxiliary language.
  • 1961: A Viennese book collector named Hans Peter Kraus buys the Voynich manuscript from Anne Nill for $24,500.
  • 1969: Kraus donates the manuscript to Yale University’s Beinecke Library. It has resided there since.

Wait, But Is It A Hoax?

The early 20th century was a great time for hoaxes. The internet didn’t exist yet, so it was pretty hard to check if something was legitimately true (not that it’s that much easier now). And with the book suddenly appearing in 1912 with an obscure history attached, it’s hardly surprising that people would accuse it of being entirely manufactured. Mysterious books filled with exotic plants could make someone a lot of money back then.

Through the magic of linguistic analysis, however, most people agree that the manuscript is not a hoax. Or, if it is, it’s a very, very complex hoax.

The proof of its authenticity has to do with something called Zipf’s law. Without going into it too deeply, Zipf’s law is a rule about distribution. When applied to the Voynich manuscript, it’s used to compare the distribution of words and letters in a natural language to the language in the manuscript.

In a natural language, there are going to be words that are used a lot (“in,” “the,” “a,” etc.) and words that are not used very much (“zamboni,” “tubular,” “obstreperous,” etc.). The same goes for letters. Any natural language will have a similar looking distribution of words, and you can figure that out without even knowing what the words mean. If someone is trying to make up a language without meaning, however, the distribution will look very different. Some words would appear too frequently, or some wouldn’t appear nearly enough. A study from 2013 showed that the distribution does seem to match that of a natural language, which is a significant sign that Voynichese is a real language.

Some people, including researcher Gordon Rugg, say that the manuscript could still very well be a complicated hoax, but they’re in the minority. Rugg does have convincing arguments: the distribution of syllables in Voynichese doesn’t make sense for a natural language, for example, and there are no corrections at all in the text. Even masterful 15th-century scribes would make some errors. Yet still, the prevailing belief is that the text is too complicated to be made up without it taking a truly monumental amount of work. Even if it were a hoax, however, finding concrete evidence of that would be a very exciting development.

Someone Solved It! Oh, No They Didn’t

For anyone who has a Google news alert set for “Voynich manuscript,” the last few months have been quite a roller coaster. Many people have suddenly appeared to claim that they have finally, after 106 years, cracked the code. And then, just as quickly, their theories have fallen apart.

This isn’t too surprising, as the whole history of the Voynich manuscript has been a series of illusory successes followed by abject failures. In the beginning of the 20th century, shortly after the manuscript was introduced to the world, a medievalist named William Romaine Newbold claimed to have solved it. He was lauded for a bit, until an article in Speculum showed why his claims were actually unfounded nonsense. For Newbold to have been right, the person who wrote the manuscript would’ve needed to predict major discoveries about 500 years in advance of the rest of the scientific community.

More recently, two claims to solving the mystery have garnered attention. The first is by Nicholas Gibbs, who published a piece in the Times Literary Supplement in September 2017 claiming to have figured out how to translate the text. Gibbs’ theory was that each letter in the book was actually Latin shorthand, and that the entire manuscript was taken from an existing text on women’s health from the 15th century. The only problem, as Ars Technica points out, is that the grammar of the Latin was completely incorrect. The idea that it was a women’s health text had already been posited by researchers before, and Gibbs failed to actually translate any part of the text.

The slightly more promising claim comes to us from artificial intelligence. If there’s one advantage we have over 1912, it’s that we can now employ massive computing power to try to solve problems. A paper published in 2016 by computer scientist Greg Kondrak and his student Bradley Hauer gained a lot of traction in the media in early 2018. Their theory was based on the common idea that Voynichese was made with a substitution cipher. That means it was written in a real language, and then each of the letters in that language was replaced by a specific Voynichese letter. To figure out the substitution cipher, you just need to figure out what language it was originally written in.

This is where the computing power comes in. By having a computer compare the text of the Voynich manuscript to 380 languages, Kondrak and Hauer ran the substitution cipher through as many languages as possible and determined if the text would make sense. In the end, they figured out that the original language was Hebrew, and they even translated some of the text. There are just a few problems with their results, though: they compared the manuscript to modern Hebrew, not 15th-century Hebrew; they assumed all the words were anagrams, meaning the words were out of order; they had to make “spelling corrections” for it to make sense; they did not consult Hebrew scholars; and, possibly most egregious of all, they got their results using Google Translate.

Despite all the innovations in code-breaking over the last century, the Voynich manuscript still rejects our advances. Some of this is media hype, because sometimes (like in the case of Kondrak and Hauer), the results are meant to be a starting point, not a conclusion. If this mystery is indeed ever solved, artificial intelligence could very likely play a role.

Onward To The Future

For over a hundred years, the Voynich manuscript has sparked the imagination of amateur codebreakers and historians. In 2016, the manuscript was published in its entirety, accompanied by essays encouraging readers to join in on the investigation. And while Voynichese isn’t the only lost language, it is the most hotly contested. If it had been solved back in the 1920s, it likely would be a little-known artifact. But now, for its sheer weirdness, it is at the center of a very nerdy obsession.

There are plenty of reasons to be optimistic about this riddle being solved in the near future. From Alan Turing’s cracking of the complex substitution cipher known as the Enigma code during World War II (which was recounted in The Imitation Game) to today, humans have gotten better and better at cracking obscure codes. It remains mind-boggling that despite all this new technology, some old book still eludes our understanding.

When the time comes, it will probably be hard to convince everyone to agree on a solution. People who have studied the Voynich manuscript are very quick to poke holes in theories that people are floating. There may never be a wide consensus. The grimmest part of reaching a definitive conclusion, however, is this: the solution will probably be boring. After so much time and so many theories, could any solution possibly be as exciting as the mystery itself?

