five

KAIROS Schema Learning Complex Event Annotation

收藏
DataCite Commons2025-06-09 更新2026-05-03 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2025T07
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>KAIROS Schema Learning Complex Event Annotation was developed by the Linguistic Data Consortium (LDC) to support the DARPA KAIROS program. It contains English and Spanish text, audio, video and image data labeled for 93 real-world complex ecvents (CEs) with event, relation and argument annotations linking to document provenance.</p> <p>The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus.</p> <h3>Data</h3> <p>Source data was collected from the web by LDC. 3431 root web pages were collected and processed, yielding 1919 text data files, 24019 image files, 1472 video files and 16 audio files. Annotatioan steps included provenance linking (linking events in a document to a CE) and mentions (event and relation frames). Data scouting and annotation guidelines are included in the documentation accompanying this release.</p> <p>The table below summarizes the number of documents collected and the annotation applied to them:</p> <ul> <li>Total CEs - total complex events subject to data collection and annotation</li> <li>Total Docs Source - CE-relevant root documents collected and processed</li> <li>Total Docs for Provlink - root documents labeled for provenance linking</li> <li>Total Docs Mention - root documents labeled for events, relations, and schema<br />linking</li> </ul> <table><caption>&nbsp;</caption> <thead> <tr> <th>Language</th> <th>Total CEs</th> <th>Total Docs Source</th> <th>Total Docs for Provlink</th> <th>Total Docs Mention</th> </tr> </thead> <tbody> <tr> <td>English</td> <td>93</td> <td>2,190</td> <td>650</td> <td>216</td> </tr> <tr> <td>Spanish</td> <td>90</td> <td>1,241</td> <td>493</td> <td>122</td> </tr> <tr> <td>Total</td> <td>93</td> <td>3,431</td> <td>1,143</td> <td>338</td> </tr> </tbody> </table> <p>Software tools are also included in this release. The tools recreate original source data from the processed XML material.</p> <ul> <li>ltf2rsd.perl -- convert ltf.xml files to rsd.txt (raw-source-data)</li> <li>ltfzip2rsd.perl -- extract and convert ltf.xml files from zip archives</li> </ul> <h3>Sponsorship</h3> <p><span data-olk-copy-source="MessageBody">KAIROS was sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA)</span> under Contract No. HR0011-19-S-0014.</p> <h3>Updates</h3> <p>No updates at this time.</p>
提供机构:
Linguistic Data Consortium
创建时间:
2025-06-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作