Russian National Corpus 2.0: New opportunities and development prospects
Svetlana O. Savchuk
Vinogradov Russian Language Institute, Russian Academy of Sciences, Moscow, Russia; savsvetlana@mail.ru
Timofey Arkhangelskiy
Hamburg University, Hamburg, Germany; timarkh@gmail.com
Anastasiya A. Bonch-Osmolovskaya
HSE University, Moscow, Russia; Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia; abonch@gmail.com
Ol’ga V. Donina
Voronezh State University, Voronezh, Russia; olga-donina@mail.ru
Yuliya N. Kuznetsova
Lomonosov Moscow State University, Moscow, Russia; Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia; kuznetsova.yn@gmail.com
Ol’ga N. Lyashevskaya
HSE University, Moscow, Russia; Vinogradov Russian Language Institute,
Russian Academy of Sciences, Moscow, Russia; olesar@yandex.ru
Boris V. Orekhov
HSE University, Moscow, Russia; nevmenandr@gmail.com
Mariya V. Podryadchikova
independent researcher; mpodr2015@gmail.com
Abstract:
The paper provides an overview of the results of the fundamental reconstruction and modernization project of the National Corpus of the Russian Language platform, carried out from 2020 to 2023. The focus of the paper is on the new opportunities that are opening up for linguists and a wider audience. This includes improving the representativeness of existing corpora, creating new corpora, new annotation obtained through the application of neural network models, and new interface solutions. Three notable new components are examined in more detail: a resource-related one, which is the new Social Networks corpus, a search-related one, which is the Panchronic corpus that combines searches across corpora from different periods, and an analytical one, which is the functional complex of statistics and data visualization.
For citation:
Savchuk S. O., Arkhangelskiy T., Bonch-Osmolovskaya A. A., Donina O. V., Kuznetsova Yu. N., Lyashevskaya O. N., Orekhov B. V., Podryadchikova M. V. Russian National Corpus 2.0: New opportunities and development prospects. Voprosy Jazykoznanija, 2024, 2: 7–34.
Acknowledgements:
The current research was a part of research work supported by the Ministry of Science and Higher Education grant No. 075-15-2020-793. The authors express their gratitude for the valuable assistance and fruitful cooperation to D. V. Sichinava, A. N. Dyshkant, S. Y. Toldova, N. S. Gorbunov, D. A. Fursina, A. A. Makhova, S. V. Piskunova, N. N. Builova, D. G. Borodina, A. D. Kozerenko, I. I. Vinogradova, S. A. Gladilin, D. A. Morozov, V. G. Sizov, P. V. Dyachenko, A. O. Kazennikov, N. A. Vlasova, A. V. Glazkova, S. S. Stolyarov, T. A. Garipov, I. A. Smal’.