five

A hybrid approach to the small unannotated corpus-based language comparison and its application to the Old East Slavic charters - Supplementary material 1 (Old East Slavic)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14057668
下载链接
链接失效反馈
官方服务:
资源简介:
Old East Slavic charters (XII - XIV century) General description A set of nine historical Old East Slavic legal texts from Smolensk, Polack and Novgorod from the end of the XII century to the first half of the XIV century. The source for Smolensk charters is Avanesov (1963), which contains original texts as well as their initial deciphering (not machine-readable). The source for Polack charters is Horoshkevich (2015), containing trascribed machine-readable texts that required only an additional check and preparation. The source for Novgorod charters is Napierskij (1857), carrying original texts and their initial non-machine-readable deciphering. All the texts underwent an additional preprocessing of reconstructed and contracted parts deletion in order to better represent the actual texts under consideration and exclude as many research biases as it is possible. The last step was a manual tokenisation and the joining of each text into a single string. The data statement is available among the downloadable files. How-to This section contains the tutorials that allow to use this data with the intended pipelines. Corpus-based distance measurement package The source code for package is available here, the manual is available in the README section of the repository. To use this dataset for the measurement of distance between Smolensk, Polack and Novgorod lects, and their subsequent clusterisation, following steps should be completed: Download the Jupyter notebook that streamlines the package use. Download the dataset. Put the dataset into a selected folder on your computer (make sure there are no other files within this folder). Insert the path to the directory into CONTENT_DIR variable in the Jupyter notebook. Run the notebook, adjusting the parameters, if necessary.
创建时间:
2024-12-01
二维码
社区交流群
二维码
科研交流群
商业服务