five

SenSem Databank

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2015T02
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>SenSem (Sentence Semantics) Databank was developed by <a href="http://grial.uab.es/index.php"> GRIAL</a>, the Linguistic Applications Inter-University Research Group that includes the following Spanish institutions: the <a href="http://www.uab.es/web/universitat-autonoma-de-barcelona-1345467954774.html">Universitat Autonoma de Barcelona</a>, the <a href="http://www.ub.edu/web/ub/en/index.html">Universitat de Barcelona</a>, the <a href="http://www.udl.es/en.html">Universitat de Lleida</a> and <a href="http://www.uoc.edu/portal/en/index.html">the Universitat Oberta de Catalunya</a>. It contains syntactic and semantic annotation for over 35,000 sentences, approximately one million words of Spanish and approximately 700,000 words of Catalan translated from the Spanish. GRIAL's work focuses on resources for applied linguistics, including lexicography, translation and natural language processing.</p><br> <p>Each sentence in SenSem Databank was labeled according to the verb sense it exemplifies, the type of complement it takes (arguments or adjuncts) and the syntactic category and function. Each argument was also labeled with a semantic role. Further information about the SenSem project can be obtained from the GRIAL website at <a href="http://grial.uab.es/sensem/corpus">http://grial.uab.es/sensem/corpus</a>.</p><br> <h3>Data</h3><br> <p>The Spanish source data includes texts from news journals (30,000 sentences) and novels (5,299 sentences). Those sentences represent around 1,000 different verb meanings that correspond to the 250 most frequent Spanish verbs. Verb frequencies were retrieved from a quantitative analysis of around 13 million words.</p><br> <p>The Catalan corpus was developed by translating the news journal portion of the Spanish data set, resulting in a resource of over 700,000 sentences from which 391,267 sentences were annotated. Sentences were automatically translated and manually post-edited; some were re-annotated for sentence complements. Semantic information was the same for both languages. The Catalan sentences represent close to 1,300 different verbs.</p><br> <p>Data is presented in a single XML file per language.</p><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC2015T02.txt">sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2015 Dr. Ana Fernandez Montraveta, Dr. Gloria Vázquez-Garcia, Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作