Magomed I. Magomedov a, Arsen K. Abdulaev a, Annie Gagliardi b, Maria Polinsky b, @
a The Gamzat Tsadasa Institute of Language, Literature and Arts, Daghestan Academic Centre, Russian Academy of Sciences, Makhachkala, 367025, Russia;
b Harvard University, Cambridge, 02138, USA;
This paper follows two related goals. First, it introduces a simple methodology that can be used to investigate first language acquisition in lesser-studied languages, including those that may be endangered. The proposed methodology consists of three steps, namely, (i) corpus building, (ii) data mining based on that corpus, and (iii) subsequent experimental studies motivated by patterns observed in the corpus. The second goal of this paper is to apply the proposed methodology to the investigation of noun-class acquisition in the Nakh-Dagestanian language Tsez. Based on the proposed methodology, we have constructed a new corpus of child-directed speech and child language (about ten hours of speech), which we analyze to determine the role of predictive cues that help the learner assign a given noun to a noun class. Since Tsez does not overtly mark agreement on all its verbs or adjectives, we also use the corpus data to assess the extent of overt noun-class agreement. We conclude that a variety of semantic and phonological (formal) cues allow Tsez speakers to determine the noun class of a given nominal. We then present and analyze an elicited-production experiment that uncovers asymmetries in the classification of nouns with predictive features in the corpus and by children and adults. We show that children are biased to use phonological information over semantic information, despite a statistical asymmetry in the other direction.