MWE-Aware English Dependency Corpus

Name: MWE-Aware English Dependency Corpus
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:30:21
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2017T01

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>MWE-Aware English Dependency Corpus was developed by the <a href="http://isw3.naist.jp/Contents/Research/mi-01-en.html">Nara Institute of Science and Technology Computational Linguistics Laboratory</a> and consists of English compound function words annotated in dependency format. The data is derived from OntoNotes Release 5.0 (<a href="../../../LDC2013T19">LDC2013T19</a>).</p><br> <p>Compound function words are a type of multiword expression (MWE). MWEs are groups of tokens that can be treated as a single semantic or syntactic unit. Doing so facilitates natural language processing tasks such as constituency and dependency parsing.</p><br> <p>Version 2.0 is available from LDC as MWE-Aware English Dependency Corpus 2.0 (<a href="../../../LDC2017T16">LDC2017T16</a>)</p><br> <h3>Data</h3><br> <p>MWE-Aware English Dependency Corpus was derived from the Wall Street Journal portion of OntoNotes Release 5.0. MWEs were identified in OntoNotes' phrase structure trees and each MWE was established as a single subtree. Those phrase structure subtrees were then converted to a dependency structure (<a href="http://nlp.stanford.edu/software/stanford-dependencies.shtml">the Stanford dependencies</a>) in <a href="http://universaldependencies.org/format.html">CoNLL format</a>.</p><br> <p>The data is split into 1,728 phrase structure trees as *.parse files and a single 14-column tab separated dependency as a *.conll file. Both file types are encoded as UTF-8.</p><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC2017T01.txt">sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 1987-1989 Dow Jones & Company, Inc. © 2017 NAIST Computational Linguistics Laboratory, © 2007, 2008, 2009, 2011, 2013, 2017 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集