FrancophonIA/DaMuEL_1.0_fr
收藏Hugging Face2025-03-30 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/FrancophonIA/DaMuEL_1.0_fr
下载链接
链接失效反馈官方服务:
资源简介:
DaMuEL是一个包含53种语言的大型多语言实体链接数据集,包括一个包含实体语言无关信息的知识库,以及链接到该知识库的维基百科文本。每个实体在维基百科文档中只标注一个提及,并自动检测文档中链接的每个实体的所有命名实体提及。数据集包含27.9M个命名实体和12.3G个维基百科文本令牌。数据集以CC BY-SA许可发布,这里提供的是法语分割版本。
DaMuEL is a large multilingual dataset for entity linking containing data in 53 languages, including a knowledge base with language-agnostic information about entities and Wikipedia texts linked to the knowledge base. Each entity in the Wikipedia documents is annotated with only one mention, and all named entity mentions linked from each document are automatically detected. The dataset contains 27.9M named entities and 12.3G tokens from Wikipedia texts. It is published under the CC BY-SA license, and here we provide the French split version.
提供机构:
FrancophonIA



