AIDA Scenario 3 Practice Topic Source Data and Annotation

Name: AIDA Scenario 3 Practice Topic Source Data and Annotation
Creator: Linguistic Data Consortium
Published: 2025-06-03 15:38:29
License: 暂无描述

DataCite Commons2025-06-03 更新2026-05-03 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2025T02

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3> <p>AIDA Scenario 3 Practice Topic Source Data and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of English, Russian and Spanish web documents (text, video, image) and annotations.</p> <p>The DARPA AIDA (Active Interpretation of Disparate Alternatives) program aimed to develop a multi-hypothesis semantic engine to generate explicit alternative interpretations of events, situations and trends from a variety of unstructured sources. LDC supported AIDA by collecting, creating and annotating multimodal linguistic resources in multiple languages.</p> <p>Each phase of the AIDA program centered on a specific scenario, or broad topic area, with related subtopics designated as either practice subtopics or evaluation subtopics. The Phase 3 scenario focused on the COVID-19 global pandemic. This corpus contains source documents and annotations for the Scenario 3 practice topics.</p> <h3>Data</h3> <p>Source documents were collected from the web by a combination of automatic and manual processes. HTML content was converted from its original form into XML. To the extent possible, all resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) were stored as separate files of the given data type and assigned separate 9-character file-IDs (the same form of ID used for the "root" HTML page).</p> <p>The corpus contains 1417 root documents; 279 documents were annotated. Annotations include:</p> <ul> <li>Event, relation and entity annotation (64 documents)</li> <li>Claim frame annotation: claims (true or not) relating to the COVID-19 pandemic (203 documents)</li> <li>Practice topic query claim frames: example claim frames intended to be used by systems as queries to extract similar claims from additional documents (30 documents)</li> </ul> <p>Claim frame annotations were produced by LDC; University of Colorado Boulder; Johns Hopkins University; Language Technologies Institute, Carnegie Mellon University; and Univeristy of Illinois Urbana-Champaign.</p> <p>Annotations are presented as tab separated files.  </p> <h3>Sponsorship</h3> <p>This material is based upon work supported by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-18-C-0013.</p>

提供机构：

Linguistic Data Consortium

创建时间：

2025-01-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集