five

Phrase Detectives Corpus

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2017T08
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Phrase Detectives Corpus was developed by the <a href="https://www.essex.ac.uk/csee/">School of Computer Science and Electronic Engineering at the University of Essex</a> and consists of approximately 19,012 words across 40 documents anaphorically-annotated by the <a href="http://anawiki.essex.ac.uk/phrasedetectives/">Phrase Detectives game</a>, an online interactive "game-with-a-purpose" (GWAP) designed to collect data about English anaphoric coreference.</p><br> <p>GWAPs for creating language resources are growing. In general, they employ non-monetary incentives, such as entertainment, to motivate participation and can be successful for large-scale persistent annotation efforts.</p><br> <h3>Data</h3><br> <p>The documents in the corpus are taken from <a href="https://www.wikipedia.org/">Wikipedia</a> articles and from narrative text in <a href="https://www.gutenberg.org/">Project Gutenberg</a>. Wikipedia articles and annotation files are presented as XML and Project Gutenberg source files are presented as plain text. All text is encoded as UTF-8. Annotations are comprised of a gold standard version created by multiple experts, as well as a set created by a large non-expert crowd (via the Phase Detectives game).</p><br> <p>The data was annotated according to a prevalent linguistically-oriented approach for anaphora used in several tasks, including OntoNotes Release 5.0 (<a href="../../../LDC2013T19">LDC2013T19</a>), SemEval-2010 Task 1 Ontonotes English: Coreference Resolution in Multiple Languages (<a href="../../../LDC2011T01">LDC2011T01</a>) and The ARRAU Corpus of Anaphoric Information (<a href="../../../LDC2013T22">LDC2013T22</a>).</p><br> <h3>Samples</h3><br> <p>Please view the following <a href="desc/addenda/LDC2017T08.txt">source sample</a>&nbsp;and <a href="desc/addenda/LDC2017T08.xml">annotation sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2017 University of Essex, © 2017 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作