MWE-Aware English Dependency Corpus
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2017T01
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>MWE-Aware English Dependency Corpus was developed by the <a href="http://isw3.naist.jp/Contents/Research/mi-01-en.html">Nara Institute of Science and Technology Computational Linguistics Laboratory</a> and consists of English compound function words annotated in dependency format. The data is derived from OntoNotes Release 5.0 (<a href="../../../LDC2013T19">LDC2013T19</a>).</p><br>
<p>Compound function words are a type of multiword expression (MWE). MWEs are groups of tokens that can be treated as a single semantic or syntactic unit. Doing so facilitates natural language processing tasks such as constituency and dependency parsing.</p><br>
<p>Version 2.0 is available from LDC as MWE-Aware English Dependency Corpus 2.0 (<a href="../../../LDC2017T16">LDC2017T16</a>)</p><br>
<h3>Data</h3><br>
<p>MWE-Aware English Dependency Corpus was derived from the Wall Street Journal portion of OntoNotes Release 5.0. MWEs were identified in OntoNotes' phrase structure trees and each MWE was established as a single subtree. Those phrase structure subtrees were then converted to a dependency structure (<a href="http://nlp.stanford.edu/software/stanford-dependencies.shtml">the Stanford dependencies</a>) in <a href="http://universaldependencies.org/format.html">CoNLL format</a>.</p><br>
<p>The data is split into 1,728 phrase structure trees as *.parse files and a single 14-column tab separated dependency as a *.conll file. Both file types are encoded as UTF-8.</p><br>
<h3>Samples</h3><br>
<p>Please view this <a href="desc/addenda/LDC2017T01.txt">sample</a>.</p><br>
<h3>Updates</h3><br>
<p>None at this time.</p></br>
Portions © 1987-1989 Dow Jones & Company, Inc. © 2017 NAIST Computational Linguistics Laboratory, © 2007, 2008, 2009, 2011, 2013, 2017 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



