five

ModeS TimeBank 1.0

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2012T01
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><p>ModeS TimeBank 1.0 was developed by researchers at <a href="http://www.upm.es/internacional" rel="nofollow">Technical University of Madrid</a> and <a href="http://www.barcelonamedia.org/en" rel="nofollow">Barcelona Media</a> and is a corpus of Modern Spanish (17th and 18th centuries) annotated with temporal and event information according to TimeML mark-up and annotated with spatial information following the SpatialML scheme.</p><p> TimeML (Pustejovsky et al., 2005) is a specification language for annotating eventualities and time expressions in natural language as well as the temporal relations among them, thus facilitating the task of extraction, representation and exchange of temporal information. SpatialML (Mani et al., 2008) is a specification language for annotating and normalizing spatial expressions by means of geographic coordinates.</p><p>LDC has released the following corpora incorporating TimeML or SpatialML annotation: <a href="http://catalog.ldc.upenn.edu/LDC2006T08" rel="nofollow">TimeBank 1.2 LDC2006T08</a>, <a href="http://catalog.ldc.upenn.edu/LDC2009T23" rel="nofollow">FactBank 1.0 LDC2009T23</a>, <a href="http://catalog.ldc.upenn.edu/LDC2011T02" rel="nofollow">ACE 2005 English SpatialML Annotations Version 2 LDC2011T02</a> and <a href="http://catalog.ldc.upenn.edu/LDC2010T09" rel="nofollow">ACE 2005 Mandarin SpatialML Annotations LDC2010T09</a>. </p><h3>Data</h3><p>ModeS TimeBank 1.0 contains 102 documents reporting a sea-crossing cruise by a ship called <i>La Princesa,</i> which took place from December 1768 to April 1769. There exist copious logbooks from that period that not only provide information about shipping routes, but also contain valuable data concerning information flows, commercial agents and social networks. The original corpus manuscript is preserved in the <em>Archivo General de Indias</em> (General Archive of the Indies) and is available online at the <a href="http://pares.mcu.es/" rel="nofollow">Portal de Archivos Espa?oles</a>. This corpus was created within the framework of the <a href="http://www.dyncoopnet-pt.org/" rel="nofollow"> DynCoopNet project</a> (Dynamic Compatibility of Cooperation-Based Self-Organizing Networks in the First Global Age) which is focused on the study of trade network cooperation during the 15th-19th centuries and incorporates into its work maps, charts, databases and natural language documents. </p><p>All text is encoded in UTF-8. The data in ModeS TimeBank 1.0 has been tokenized, POS-tagged, and annotated with space, time and event information according to the TimeML and SpatialML specification schemes. More specifically, the entities annotated in the corpus are the following:</p><ul> <li>Events: (tag EVENT, from TimeML). These include finite and non-finite verbal constructions, nominalizations, nouns, adjectives and prepositional phrases.</li> <li>Temporal expressions (tag TIMEX3, from TimeML). These includeg expressions of dates, times, durations and frequencies, both precise and vague.</li> <li>Spatial expressions (tag PLACE, from SpatialML). These are used for proper and common nouns, adjectives, adverbs or spatial coordinates.</li> </ul><h3>Samples</h3><p>Please see the following links for examples of <a href="./desc/addenda/LDC2012T01.annot.txt" rel="nofollow">annotated</a> and <a href="./desc/addenda/LDC2012T01.orig.txt" rel="nofollow">original</a> texts. </p><h3>Updates</h3><p> None at this time.</p></br> Portions © 2012 Marta Guerrero Nieto, Roser Sauri, © 2012 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作