five

HESML V1R4 Java software library of ontology-based semantic similarity measures and information content models

收藏
doi.org2025-03-21 收录
下载链接:
http://doi.org/10.17632/t87s78dg78.4
下载链接
链接失效反馈
官方服务:
资源简介:
HESML V1R4 is the fourth release of the Half-Edge Semantic Measures Library (HESML) detailed in [1], which is a new, linerarly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based on WordNet. HESML V1R4 implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature, as well as the evaluation of three pre-trained word embedding models. It also provides a XML-based input file format in order to specify the execution of reproducible experiments on WordNet-based similarity, even with no software coding. HESML V1R4 introduces the following novelties: (1) a software implementation for the evaluation of three pre-trained word embedding file formats which support most of state-of--the-art models reported in the literature; (2) a software implementation of an intrinsic IC model and two new IC-based semantic similarity measures introduced by Cai et al. (2017); (3) a software implementation of a fast approximation of the Wu&Palmer (1994) measure commonly used in the literature; (4) the integration of a very large set of word similarity benchmarks; and finally (5), the correction of an error in our software implementation of the Leacock&Chodorow (1998) measure in previous HESML versions. HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the main HESML paper [1] as attribution requirement. On other hand, the commercial use of the similarity measures introduced in [2], as well as part of the intrinsic IC models introduced in [3] and [4], is protected by a patent application [5]. In addition, any user of HESML must fulfill other licensing terms described in [1] related to other resources distributed with the library. References: [1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66, 97–118. [2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153. [3] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526. [4] Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (UNED). [5] Lastra Díaz, J. J., & García Serrano, A. (2016). System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model. USPTO App, US2016/0179945 A1.

HESML V1R4乃半边语义度量库(HESML)的第四版发布,如[1]所述。该库是一种新颖的、线性可扩展且高效的Java软件库,基于本体论构建语义相似度度量以及基于WordNet的信息含量(IC)模型。 HESML V1R4实现了文献中报道的多数基于本体论的语义相似度度量以及基于WordNet的信息含量模型,并评估了三种预训练词嵌入模型。此外,它还提供了基于XML的输入文件格式,以便指定在WordNet基础上进行可重复实验的执行,即使无需软件编码。 HESML V1R4引入了以下创新之处:(1)对三种预训练词嵌入文件格式的软件实现,支持文献中报道的大多数最先进的模型;(2)对Cai等人在2017年提出的内禀IC模型及两种基于IC的语义相似度度量的软件实现;(3)对Wu&Palmer(1994)度量方法的快速近似实现,该方法在文献中广泛使用;(4)集成了一组庞大的词相似度基准;最后,(5)纠正了先前HESML版本中Leacock&Chodorow(1998)度量方法软件实现中的错误。 HESML库可根据CC By-NC-SA-4.0许可免费分发,用于任何非商业目的,但需引用主要HESML论文[1]作为归属要求。另一方面,文献[2]中引入的相似度度量方法,以及文献[3]和[4]中引入的部分内禀IC模型,受专利申请[5]保护。此外,任何HESML用户都必须满足[1]中描述的与库一起分发的其他资源的许可条款。 参考文献: [1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66, 97–118. [2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153. [3] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526. [4] Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (UNED). [5] Lastra Díaz, J. J., & García Serrano, A. (2016). System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model. USPTO App, US2016/0179945 A1.
提供机构:
doi.org
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作