A hybrid approach to the small unannotated corpus-based language comparison and its application to the Old East Slavic charters - Supplementary material 2 (Modern East Slavic)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14148179
下载链接
链接失效反馈官方服务:
资源简介:
Modern East Slavic dialects (Belogornoje, Megra, Zialionka)
General description
A set of modern East Slavic Belogornoje, Megra and Zialionka small territorial lects subcorpora. Megra is an autochthonous (Barannikova, 2005) Northern Russian small territorial lect (Kryuchkova and Goldin, 2011). Belogornoje is a late settlement (Barannikova, 2005) Central Russian small territorial lect (Kryuchkova and Goldin, 2011). Zialionka is an autochthonous Northern Belarusian lect, radically different from Belogornoje and Megra by most of the key isoglosses within the Eastern part of the Slavic continuum.
Sources
Both Megra and Belogornoje texts originate from the Saratov dialectological corpus (Kryuchkova and Goldin, 2011). These are manually transcribed interviews with dialect speakers, mostly on the slice-of-life, rarely touching the topic of religion, recorded during the field trips of Saratov State University from 1980 to 2019. They possess some tagging, but for the purpose of clear cross-evaluation, the experiments do not use this information. The transcription is phonemic, faithful to the dialect features, and remains untouched in the experiments.
Zialionka texts are also phonemically transcribed and untouched in experiments, they come from the Polack ethnographic collection (Lobač, 2011). The main genre is folklore tales, collected by transcribing interviews with small territorial lects speakers during the field trips of Polack State University (Belarus) from 1992 to 2010 years. There are no traces of notable phonetic irregularities within the texts. Unfortunately, there is no way to reliably establish it, as there are no available original recordings.
The data statement is available among the downloadable files.
How-to
This section contains the tutorials that allow to use this data with the intended pipelines.
Corpus-based distance measurement package
The source code for package is available here, the manual is available in the README section of the repository.
To use this dataset for the measurement of distance between Belogornoje, Megra and Zialionka lects, and their subsequent clusterisation, following steps should be completed:
Download the Jupyter notebook that streamlines the package use.
Download the dataset.
Put the dataset into a selected folder on your computer (make sure there are no other files within this folder).
Insert the path to the directory into CONTENT_DIR variable in the Jupyter notebook.
Run the notebook, adjusting the parameters, if necessary.
创建时间:
2024-12-01



