As the UNESCO Year of Indigenous Languages continues, we at the Museum want to highlight work being done to make the year a success and preserve and study lesser-known indigenous languages further. I had the opportunity to interview Keith Cunningham at Georgetown University, who is currently working on a doctoral dissertation in linguistics, who is working on an exciting new opportunity to help study and preserve these languages.
I am a third year PhD student in the linguistics department at Georgetown. My undergraduate studies were in Japanese, and I also did previous graduate work in Mandarin Chinese. My current coursework focuses on American Indian languages, such as Apsaalooke (also known as Crow, a member of the Siouan language family).
I am applying various computational phylogenetics (Using algorithms to plot a “family tree”. Originally used for biological species, it is now also applied to languages) techniques to American Indian languages. Some current techniques include the use of programs originally developed for determining the relatedness of species based on common DNA sequences. Instead of DNA, the programs calculate the relatedness of various languages based on shared cognates. Other approaches include the use of edit distance, i.e. determining the degree to which languages have diverged from one another based on sound changes in the cognates themselves. For languages that may be related at a time depth greater than can be recovered through the comparative method, phylogenetics may be based on other features such as common morphological traits. I am interested in seeing how these methods perform on each of the American Indian language families I have studied, as well as refining and improving upon the methods based on what is already known about how the languages are related.
I gathered Swadesh lists for five languages for the initial trial of my project. A thorough study would involve word lists from all attested members of a given family.
Many American Indian languages are extinct and only attested in fragmentary historical word lists. Sometimes there is insufficient vocabulary to use for comparison. Even when there is enough vocabulary, it is often written in inconsistent orthography. The process of providing phonemic transcriptions therefore takes much more time than it does to create run the programs. Also, it is at times difficult to determine which set of vocabulary to use for comparison. While the most common 100 to 200 words in a language are often resistant to borrowing, it is not unprecedented to find loanwords among them. If two languages experience renewed contact with extensive borrowings after an initial separation, it can confound attempts at determining their place on a linguistic family tree.
Computational phylogenetics techniques can potentially reveal relationships between languages and language families at a greater time depth than traditional comparative method. Future applications of it may support or refute proposals for macrofamilies that cannot be tested with traditional methods.