georges-1913-normalization

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14191141

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset was developed as part of the Burchards Dekret Digital project, funded by the Academy of Sciences and Literature in Mainz. It is based on 55,000 lemmata from Georges 1913 and contains approximately 5 million word pairs of orthographic variants and their normalized forms. Designed for tasks such as text normalization and historical linguistics, the dataset captures medieval Latin orthographic variability, including transformations like v ↔ u, ae → ę, and others. No data augmentation was applied, and it may exhibit bias against irregular forms, such as Greek loanwords. Mirror of https://huggingface.co/datasets/mschonhardt/georges-1913-normalization. See README.,d for further information.

创建时间：

2024-11-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集