georges-1913-normalization
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14191141
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was developed as part of the Burchards Dekret Digital project, funded by the Academy of Sciences and Literature in Mainz. It is based on 55,000 lemmata from Georges 1913 and contains approximately 5 million word pairs of orthographic variants and their normalized forms. Designed for tasks such as text normalization and historical linguistics, the dataset captures medieval Latin orthographic variability, including transformations like v ↔ u, ae → ę, and others. No data augmentation was applied, and it may exhibit bias against irregular forms, such as Greek loanwords.
Mirror of https://huggingface.co/datasets/mschonhardt/georges-1913-normalization. See README.,d for further information.
创建时间:
2024-11-20



