To duplicate the “intuition” that Robinson believed would elude computers, the researchers’ software makes several assumptions. The first is that the language being deciphered is closely related to some other language: In the case of Ugaritic, the researchers chose Hebrew. The next is that there’s a systematic way to map the alphabet of one language on to the alphabet of the other, and that correlated symbols will occur with similar frequencies in the two languages.Ugaritic was treated as unknown and the resultant translation turned out to be reasonably accurate.
The system makes a similar assumption at the level of the word: The languages should have at least some cognates, or words with shared roots, like main and mano in French and Spanish, or homme and hombre. And finally, the system assumes a similar mapping for parts of words. A word like “overloading,” for instance, has both a prefix — “over” — and a suffix — “ing.” The system would anticipate that other words in the language will feature the prefix “over” or the suffix “ing” or both, and that a cognate of “overloading” in another language — say, “surchargeant” in French — would have a similar three-part structure.
The system plays these different levels of correspondence off of each other. It might begin, for instance, with a few competing hypotheses for alphabetical mappings, based entirely on symbol frequency — mapping symbols that occur frequently in one language onto those that occur frequently in the other. Using a type of probabilistic modeling common in artificial-intelligence research, it would then determine which of those mappings seems to have identified a set of consistent suffixes and prefixes. On that basis, it could look for correspondences at the level of the word, and those, in turn, could help it refine its alphabetical mapping. “We iterate through the data hundreds of times, thousands of times,” says Snyder, “and each time, our guesses have higher probability, because we’re actually coming closer to a solution where we get more consistency.” Finally, the system arrives at a point where altering its mappings no longer improves consistency.
This is beneficial for languages that are related, that is both are derived from a common source. It does not seem to be of use if we discovered a language unrelated to any known ones (unlikely). But it could help with increasing deciphering speed in poorly characterised languages.
“Each language has its own challenges,” Barzilay agrees. “Most likely, a successful decipherment would require one to adjust the method for the peculiarities of a language.” But, she points out, the decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic. “The output of our system would have made the process orders of magnitude shorter,” she says.
Indeed, Snyder and Barzilay don’t suppose that a system like the one they designed with Knight would ever replace human decipherers. “But it is a powerful tool that can aid the human decipherment process,” Barzilay says. Moreover, a variation of it could also help expand the versatility of translation software.