five

Middle Dutch syllabified words

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/1324416
下载链接
链接失效反馈
官方服务:
资源简介:
Specifics of the data: Text file (syllabified_crm.txt) containing 43,710 syllabified Middle Dutch words, taken from the Corpus Van Reenen-Mulder. This corpus, created by Pieter van Reenen en Maaike Mulder at the Free University Amsterdam, contains about 2,500 Middle Dutch charters. It has about 750,000 tokens. The charters were written in the Netherlands and Flanders between 1300 and 1400. The 43,710 syllabified words in this list is the total amount of unique words from the Corpus Van Reenen-Mulder. Some tokens from this corpus were, however, excluded when assembling the data set due to the fact that they contained diacritic symbols to indicate abbreviations, clitics, or unclear parts in the original charter. A dash-symbol (-) is used as separator. Apart from the entire data set, this DOI also includes: A pdf-file visualizing the data set The splits used for the automatic syllabification experiment by Haverals, Kestemont & Karsdorp (2018). A gold standard out-of-corpus sample of 1,748 Middle Dutch words, taken at random from the Cd-rom Middelnederlands, also used in the above-mentioned syllabification experiment
创建时间:
2024-07-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作