FrancophonIA/ELTeC-NIF
收藏Hugging Face2025-03-30 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/FrancophonIA/ELTeC-NIF
下载链接
链接失效反馈官方服务:
资源简介:
ELTeC-NIF是一个将10种欧洲语言(包括德语、英语、法语、匈牙利语、波兰语、葡萄牙语、罗马尼亚语、斯洛文尼亚语和西班牙语)的文学文本集合转换为NLP Interchange Format (NIF)的语料库。这个数据集基于1840-1920年间的1000部小说,每个小说中最多包含1000个句子。这些小说的注释版本以TEI level-2格式存在,并被转换为NIF格式,以便实现NLP工具、语言资源和注释之间的互操作性。
The ELTeC-NIF is a corpus that converts a collection of literary texts in 10 European languages (including German, English, French, Hungarian, Polish, Portuguese, Romanian, Slovenian, and Spanish) into the NLP Interchange Format (NIF). This dataset is based on 1000 novels from the period 1840-1920, with each novel containing up to 1000 sentences. The annotated versions of these novels, which exist in the TEI level-2 format, are converted into the NIF format to facilitate interoperability between NLP tools, language resources, and annotations.
提供机构:
FrancophonIA



