FrancophonIA/ELTeC-NIF

Name: FrancophonIA/ELTeC-NIF
Creator: FrancophonIA
Published: 2025-03-30 14:17:09
License: 暂无描述

Hugging Face2025-03-30 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/FrancophonIA/ELTeC-NIF

下载链接

链接失效反馈

官方服务：

资源简介：

ELTeC-NIF是一个将10种欧洲语言（包括德语、英语、法语、匈牙利语、波兰语、葡萄牙语、罗马尼亚语、斯洛文尼亚语和西班牙语）的文学文本集合转换为NLP Interchange Format (NIF)的语料库。这个数据集基于1840-1920年间的1000部小说，每个小说中最多包含1000个句子。这些小说的注释版本以TEI level-2格式存在，并被转换为NIF格式，以便实现NLP工具、语言资源和注释之间的互操作性。

The ELTeC-NIF is a corpus that converts a collection of literary texts in 10 European languages (including German, English, French, Hungarian, Polish, Portuguese, Romanian, Slovenian, and Spanish) into the NLP Interchange Format (NIF). This dataset is based on 1000 novels from the period 1840-1920, with each novel containing up to 1000 sentences. The annotated versions of these novels, which exist in the TEI level-2 format, are converted into the NIF format to facilitate interoperability between NLP tools, language resources, and annotations.

提供机构：

FrancophonIA

5,000+

优质数据集

54 个

任务类型

进入经典数据集