five

Hypernyms extracted from a large text corpus using Hearst lexical-syntactic patterns

收藏
Zenodo2020-07-29 更新2026-05-28 收录
下载链接:
https://zenodo.org/record/3234817
下载链接
链接失效反馈
官方服务:
资源简介:
<br> The list of hyponym-hypernym pairs was obtained by applying lexical-syntactic patterns described in Hearst (1992) on the corpus prepared by Panchenko et al. (2016). This corpus is a concatenation of the English Wikipedia (2016 dump), Gigaword, ukWaC and English news corpora from the Leipzig Corpora Collection. The lexical-syntactic patterns proposed by Marti Hearst (1992) and further extended and implemented in the form of FSTs by Panchenko et al. (2012) for extracting (noisy) hyponym-hypernym pairs are as follows -- (i) such NP as NP, NP[,] and/or NP; (ii) NP such as NP, NP[,] and/or NP; (iii) NP, NP [,] or other NP; (iv) NP, NP [,] and other NP; (v) NP, including NP, NP [,] and/or NP; (vi) NP, especially NP, NP [,] and/or NP. Pattern extraction on the corpus yields a list of 27.6 million hyponym-hypernym pairs along with the frequency of their occurrence in the corpus.
提供机构:
Zenodo
创建时间:
2019-05-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作