five

AIDA Scenario 2 Practice Topic Source Data

收藏
DataCite Commons2025-06-03 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2024T04
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>AIDA Scenario 2 Practice Topic Source Data was developed by the Linguistic Data Consortium (LDC) and is comprised of 1500 root documents, including text, image, and video, from English, Russian, and Spanish web sources.</p> <p>The DARPA AIDA (Active Interpretation of Disparate Alternatives) program aimed to develop a multi-hypothesis semantic engine to generate explicit alternative interpretations of events, situations and trends from a variety of unstructured sources. LDC supported AIDA by collecting, creating and annotating multimodal linguistic resources in multiple languages.</p> <p>Each phase of the AIDA program centered on a specific scenario, or broad topic area, with related subtopics designated as either practice subtopics or evaluation subtopics. The Phase 2 scenario focused on the socioeconomic and political crisis in Venezuela since 2010. This corpus constitutes the full set of topic-focused documents for Phase 2 practice subtopics.</p> <h3>Data</h3> <p>Data was collected from web sources by a combination of automatic and manual processes. HTML content was converted from its original form into XML. To the extent possible, all resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) were stored as separate files of the given data type and assigned separate 9-character file-IDs (the same form of ID used for the "root" HTML page).</p> <p>The knowledge base for entity detection and linking annotation for all AIDA Scenario 1 and 2 corpora is available separately as&nbsp;<a href="../../../LDC2023T10">AIDA Scenario 1 and 2 Reference Knowledge Base (LDC2023T10)</a>.</p> <h3>Sponsorship</h3> <p>This material is based upon work supported by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-18-C-0013.</p>
提供机构:
Linguistic Data Consortium
创建时间:
2024-04-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作