SmartData
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/DFKI-NLP/smartdata-corpus
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含手动标注的交通和工业实体及关系的德语文本语料库,数据来源于新闻、RSS订阅源和推文。该数据集包含15种关系类型,具有强制性和可选属性,且其标注者之间的一致性程度为中等。规模上,数据集包含2,322个文档,共19,116个实体,1,264个关系,总词汇量达到141,344个。该数据集的任务是多属性关系提取。
This dataset is a German text corpus with manually annotated traffic and industrial entities and relationships. The data is collected from news articles, RSS feeds and tweets. It encompasses 15 types of relationships, with both mandatory and optional attributes, and exhibits a moderate level of inter-annotator agreement. In terms of scale, the dataset consists of 2,322 documents, 19,116 entities in total, 1,264 relationships, and a vocabulary size of 141,344. The task supported by this dataset is multi-attribute relationship extraction.
提供机构:
DFKI (Deutsches Forschungszentrum für Künstliche Intelligenz)



