Friday, April 24, 2009

Markov Model determines Indus Script contains grammatical structure


As someone who has experimented with artificial intelligence programs since the mid 1990s, I was riveted by this article about the use of a program, normally used to predict failure in electronic components, to analyze ancient Indus script fragments. Apparently the program revealed a definite grammatical pattern in the symbols quite similar to the cadence found in other language structures.

The process of decrypting ancient languages has fascinated me for a long time. Just last week I watched a Nova production, "Cracking the Maya Code", that detailed the history of the process used to unlock the Mayan past secreted within the intricate glyphs carved on many of the structures they left behind.

I am still a little skeptical about how a Russian linguist reached the "aha" moment when he realized different symbols represented the same syllable - it just seemed like a quantum leap to me. Why would the Maya develop a system using multiple complex symbols to mean the same thing. I could understand it if it was like differences in a regional dialect, but apparently the various symbols all occurred within the same context. Maybe they're examples of the ancient development of synonyms.

Anyway, the problem of decrypting the Indus script is even far more challenging than the challenges encountered in deciphering the Mayan glyphs as the longest script fragment found to date only contains 27 symbols. The Indus script is an elegant series of highly detailed pictograms like the one above. If we have only 27 symbols to work with and some of them represent the same syllable, as occurred with the Mayan glyphs, the task may prove to be ultimately impossible unless more extensive scripts are found.

Computational analysis of symbols used 4,000 years ago by a long-lost Indus Valley civilization suggests they represent a spoken language. Some frustrated linguists thought the symbols were merely pretty pictures.

"The underlying grammatical structure seems similar to what's found in many languages," said University of Washington computer scientist Rajesh Rao.

The Indus script, used between 2,600 and 1,900 B.C. in what is now eastern Pakistan and northwest India, belonged to a civilization as sophisticated as its Mesopotamian and Egyptian contemporaries. However, it left fewer linguistic remains. Archaeologists have uncovered about 1,500 unique inscriptions from fragments of pottery, tablets and seals. The longest inscription is just 27 signs long..."

"...In 2004, linguist Steve Farmer published a paper asserting that the Indus script was nothing more than political and religious symbols. It was a controversial notion, but not an unpopular one.

"...Rao, a machine learning specialist who read about the Indus script in high school and decided to apply his expertise to the script while on sabbatical in Inda, may have solved the language-versus-symbol question, if not the script itself.

"One of the main questions in machine learning is how to generalize rules from a limited amount of data," said Rao. "Even though we can't read it, we can look at the patterns and get the underlying grammatical structure."

Rao's team used pattern-analyzing software running what's known as a Markov model, a computational tool used to map system dynamics.

They fed the program sequences of four spoken languages: ancient Sumerian, Sanskrit and Old Tamil, as well as modern English. Then they gave it samples of four non-spoken communication systems: human DNA, Fortran, bacterial protein sequences and an artificial language.

The program calculated the level of order present in each language. Non-spoken languages were either highly ordered, with symbols and structures following each other in unvarying ways, or utterly chaotic. Spoken languages fell in the middle.

When they seeded the program with fragments of Indus script, it returned with grammatical rules based on patterns of symbol arrangement. These proved to be moderately ordered, just like spoken languages.

As for the meaning of the script, the program remained silent." - More: Wired