IBM Debater Mention Detection Benchmark
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/IBM_Debater_Mention_Detection_etc
下载链接
链接失效反馈官方服务:
资源简介:
用于提及检测的大型高质量基准数据集。提及检测的目标是将文本中提到的实体/概念映射到知识库中的正确概念。基准包含命名实体以及其他类型实体的注释,这些注释在不同类型的文本上进行注释,范围从从Wikipedia获取的干净文本到嘈杂的语音数据。该基准是通过高度控制的众包流程建立的,以确保其质量。有3000个句子,在维基百科句子中总共有6375个提及,在口语句子中总共有6239个提及。
A large-scale high-quality benchmark dataset for mention detection. The objective of mention detection is to map entities or concepts mentioned in text to their correct corresponding concepts within a knowledge base. This benchmark includes annotations for named entities as well as other types of entities, which are annotated across a variety of text types ranging from clean texts sourced from Wikipedia to noisy speech data. It was developed through a highly controlled crowdsourcing process to ensure its quality. The dataset contains a total of 3,000 sentences, with 6,375 mentions in Wikipedia-derived sentences and 6,239 mentions in spoken sentences.
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个用于提及检测任务的高质量基准,包含维基百科和口语文本的标注数据,总计超过1.2万个提及,通过严格众包流程构建以确保可靠性。
以上内容由遇见数据集搜集并总结生成



