five

COMNOM v 1.0

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2008T24
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>COMNOM is an automatically enriched version of <a href="http://nlp.cs.nyu.edu/comlex/" rel="nofollow">COMLEX Syntax</a> that was created at New York University as part of the <a href="http://nlp.cs.nyu.edu/meyers/NomBank.html" rel="nofollow">NomBank</a> annotation project. COMLEX resources are distributed by the Linguistic Data Consortium (LDC) and consist of the following: <a href="http://catalog.ldc.upenn.edu/LDC98L21" rel="nofollow">COMLEX English Syntax Lexicon (LDC98L21)</a>, an English dictionary consisting of approximately 38,000 lemmas with detailed information about the syntactic characteristics of each lexical item and subcategorization (complement structures); and <a href="http://catalog.ldc.upenn.edu/LDC96T11" rel="nofollow">COMLEX Syntax Text Corpus Version 2.0 (LDC96T11)</a>.</p><br> <p>COMNOM adds classes to COMLEX Syntax lexical entries using NOMLEX-PLUS, a dictionary with approximately 8,000 entries. COMNOM collected prepositions from NOMLEX-PLUS sub-categorizations (:VERB-SUBC, :OBJECT, :SUBJECT, etc.), deduced essential complements from them and added them to the existing COMLEX entry.</p><br> <p>Further information about the methodology used in COMNOM can be found in <a href="http://nlp.cs.nyu.edu/meyers/nombank/those-other-nombank-dictionaries.pdf" rel="nofollow">Meyers, "Those Other NomBank Dictionaries -- Manual for Dictionaries that Come with NomBank"</a>. Related resources and further information about COMNOM and NomBank are available from the <a href="http://nlp.cs.nyu.edu/meyers/NomBank.html" rel="nofollow">Nom Bank</a> project website.</p><br> <p>A license to COMLEX English Syntax Lexicon (LDC98L21) or COMLEX Syntax Text Corpus Version 2.0 (LDC96T11) is required in order to obtain COMNOM v. 1.0.</p><br> <h3>Data</h3><br> <p>This release includes three versions of COMNOM which correspond to the three versions of NOMLEX-PLUS and are characterized by the amount of corpus training that influenced their creation. The data used for training are the Wall Street Journal materials in the Penn Treebanks (<a href="http://catalog.ldc.upenn.edu/LDC95T7" rel="nofollow">Treebank-2</a> and <a href="http://catalog.ldc.upenn.edu/LDC99T42" rel="nofollow">Treebank-3</a>), with annotations from <a href="../../../LDC2004T14">Proposition Bank I</a> and <a href="http://catalog.ldc.upenn.edu/LDC2008T23" rel="nofollow">NomBank 1.0</a>.</p><br> <p>The three versions are:</p><br> <ul><br> <li>COMNOM-clean.1.0 -- contains no information derived from annotated data</li><br> <li>COMNOM.1.0 -- contains information from the entire annotated corpus</li><br> <li>COMNOM-training.1.0 -- contains information from annotated data in sections 02-21 of the corpus only.</li><br> </ul></br> Portions © 1987-1989 Dow Jones &amp; Company, Inc., © 1996, 1998, 2008 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作