ACE 2005 Multilingual Training Corpus

Name: ACE 2005 Multilingual Training Corpus
Creator: UC Berkeley Library Dataverse
Published: 2025-12-18 21:24:47
License: 暂无描述

DataCite Commons2025-12-18 更新2025-04-16 收录

下载链接：

https://datasets.lib.berkeley.edu/citation?persistentId=doi:10.60503/D3/CDCBX1

下载链接

链接失效反馈

官方服务：

资源简介：

ACE 2005 Multilingual Training Corpus was developed by the Linguistic Data Consortium (LDC) and contains approximately 1,800 files of mixed genre text in English, Arabic, and Chinese annotated for entities, relations, and events. This represents the complete set of training data in those languages for the 2005 Automatic Content Extraction (ACE) technology evaluation. The genres include newswire, broadcast news, broadcast conversation, weblog, discussion forums, and conversational telephone speech. The data was annotated by LDC with support from the ACE Program and additional assistance from LDC. The objective of the ACE program was to develop automatic content extraction technology to support automatic processing of human language in text form. In November 2005, sites were evaluated on system performance in five primary areas: the recognition of entities, values, temporal expressions, relations, and events. Entity, relation, and event mention detection were also offered as diagnostic tasks. All tasks with the exception of event tasks were performed for three languages, English, Chinese, and Arabic. Events tasks were evaluated in English and Chinese only. This release comprises the official training data for these evaluation tasks. For more information about linguistic resources for the ACE Program, including annotation guidelines, task definitions and other documentation, see LDC's ACE website. <br></br> Suggested citation: <br></br> Walker, Christopher, et al. ACE 2005 Multilingual Training Corpus LDC2006T06. Web Download. Philadelphia: Linguistic Data Consortium, 2006.

提供机构：

UC Berkeley Library Dataverse

创建时间：

2024-09-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集