BENGAL

arXiv2018-11-01 更新2024-06-21 收录

下载链接：

https://hobbitdata.informatik.uni-leipzig.de/bengal/

下载链接

链接失效反馈

官方服务：

资源简介：

BENGAL数据集是由德国帕德博恩大学的数据科学组和莱比锡大学的AKSW研究组共同创建的，用于自动生成实体识别和链接的基准测试数据。该数据集包含13个子集，每个子集通过不同的生成策略创建，以评估和比较11种工具在英语、巴西葡萄牙语和西班牙语环境下的性能。BENGAL数据集的创建过程利用了丰富的Web上的结构化数据，特别是RDF格式的数据，通过自然语言生成技术自动生成注释的自然语言陈述。数据集的应用领域主要集中在解决实体识别和链接的准确性和可扩展性问题，特别是在资源较少的语言环境中。

The BENGAL dataset was co-created by the Data Science Group at the University of Paderborn and the AKSW Research Group at the University of Leipzig to automatically generate benchmark data for entity recognition and linking. The dataset includes 13 subsets, each developed with distinct generation strategies to evaluate and compare the performance of 11 tools across English, Brazilian Portuguese, and Spanish environments. The construction of the BENGAL dataset leverages abundant structured data from the Web, particularly RDF-formatted data, to automatically generate annotated natural language statements via natural language generation technologies. This dataset is primarily targeted at resolving challenges related to the accuracy and scalability of entity recognition and linking, especially in low-resource language contexts.

提供机构：

数据科学组，帕德博恩大学，德国和 AKSW 研究组，莱比锡大学，德国

创建时间：

2017-10-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集