IBM Debater Mention Detection Benchmark

Name: IBM Debater Mention Detection Benchmark
Creator: OpenDataLab
Published: 2026-05-24 08:30:06
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/IBM_Debater_Mention_Detection_etc

下载链接

链接失效反馈

官方服务：

资源简介：

用于提及检测的大型高质量基准数据集。提及检测的目标是将文本中提到的实体/概念映射到知识库中的正确概念。基准包含命名实体以及其他类型实体的注释，这些注释在不同类型的文本上进行注释，范围从从Wikipedia获取的干净文本到嘈杂的语音数据。该基准是通过高度控制的众包流程建立的，以确保其质量。有3000个句子，在维基百科句子中总共有6375个提及，在口语句子中总共有6239个提及。

A large-scale high-quality benchmark dataset for mention detection. The objective of mention detection is to map entities or concepts mentioned in text to their correct corresponding concepts within a knowledge base. This benchmark includes annotations for named entities as well as other types of entities, which are annotated across a variety of text types ranging from clean texts sourced from Wikipedia to noisy speech data. It was developed through a highly controlled crowdsourcing process to ensure its quality. The dataset contains a total of 3,000 sentences, with 6,375 mentions in Wikipedia-derived sentences and 6,239 mentions in spoken sentences.

提供机构：

OpenDataLab

创建时间：

2022-05-23

搜集汇总

数据集介绍