CoNLL-2012 Shared Task English dataset

Name: CoNLL-2012 Shared Task English dataset
Creator: CoNLL
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://cemantix.org/conll/2012/data.html

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集共包含2802份训练文档、343份验证文档以及348份测试文档，覆盖了包括新闻专线、杂志文章、广播新闻、广播对话、网络数据、对话语音数据和《新约》在内的7种不同文体。该数据集旨在用于评估共指解析系统，其多样化的文体有助于提高模型的鲁棒性。规模上，数据集具体划分为2802份训练文档、343份验证文档以及348份测试文档，所涉及的任务是共指解析。

This dataset comprises 2802 training documents, 343 validation documents, and 348 test documents, covering 7 distinct genres including newswire, magazine articles, broadcast news, broadcast conversations, web data, conversational speech data, and the New Testament. This dataset is intended for evaluating coreference resolution systems, and its diverse genres help improve the robustness of models. In terms of scale, the dataset is specifically divided into 2802 training documents, 343 validation documents, and 348 test documents, and the task involved is coreference resolution.

提供机构：

CoNLL

5,000+

优质数据集

54 个

任务类型

进入经典数据集