Direct and indirect referential competition as parameters of referential choice modelling


2026. №3, 27-53

Dmitry A. Zalmanov

Institute of Linguistics, Russian Academy of Sciences, Moscow, Russia; dazalmanov@iling-ran.ru; ORCID: 0000-0002-8618-9355

Abstract:

The choice of expressions used when referring to entities in discourse (referential choice) can be influenced by various factors, including competing mentions of other referents in the preceding discourse. Competing mentions are most commonly defined as leading to referential ambiguity due to the referents having the same gender and number, such as two different referents that, when pronominalized, would require the use of the same pronominal form he. However, corpus and psycholinguistic studies have provided evidence that referential choice can also be influenced by neighboring mentions of referents that differ in gender and/or number, so that no referential ambiguity arises. The present study introduces two additional factors into a model of referential choice, representing two types of referential competition — direct and indirect — and tests their effect using a corpus of newspaper articles from “The Wall Street Journal”. To implement these new factors, I defined the criteria for selecting competing referring expressions based on the referents’ current activation levels. Whether a referent is activated was determined using the following five factors: the distance between the anaphor and the competing antecedent measured in paragraphs, sentences, and elementary discourse units, as well as the antecedent’s animacy, and the antecedent’s grammatical role. In addition, the study addresses various problems related to the operationalization of the new factors. Their incorporation into a random forest model resulted in an increase in prediction accuracy for the two-class classification task (full NPs vs. pronouns). Further improvements in assessing the activation levels of the competing referents could further enhance the performance of these factors within the model.

For citation:

Zalmanov D. A. Direct and indirect referential competition as parameters of referential choice modelling. Voprosy Jazykoznanija, 2026, 3: 27–53.

Acknowledgements:

I would like to express my gratitude to A. A. Kibrik and the three anonymous reviewers for their valuable remarks on the article’s structure, argumentation, clarity of presentation, and terminology. I bear full responsibility for any mistakes and inaccuracies that the present paper may contain. I would also like to thank G. B. Dobrov for his help with processing of the WSJ MoRA corpus data, particularly with automating the calculation of the values for a range of factors in the referential choice model that I implemented, as well as with extracting the annotated corpus data into a relational database format. This work was supported by the Ministry of Science and Higher Education of the Russian Federation under project FMNE-2025-0002.