HESML V1R4 Java software library of ontology-based semantic similarity measures and information content models

Name: HESML V1R4 Java software library of ontology-based semantic similarity measures and information content models
Creator: doi.org
License: 暂无描述

doi.org2025-03-21 收录

下载链接：

http://doi.org/10.17632/t87s78dg78.4

下载链接

链接失效反馈

官方服务：

资源简介：

HESML V1R4 is the fourth release of the Half-Edge Semantic Measures Library (HESML) detailed in [1], which is a new, linerarly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based on WordNet. HESML V1R4 implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature, as well as the evaluation of three pre-trained word embedding models. It also provides a XML-based input file format in order to specify the execution of reproducible experiments on WordNet-based similarity, even with no software coding. HESML V1R4 introduces the following novelties: (1) a software implementation for the evaluation of three pre-trained word embedding file formats which support most of state-of--the-art models reported in the literature; (2) a software implementation of an intrinsic IC model and two new IC-based semantic similarity measures introduced by Cai et al. (2017); (3) a software implementation of a fast approximation of the Wu&Palmer (1994) measure commonly used in the literature; (4) the integration of a very large set of word similarity benchmarks; and finally (5), the correction of an error in our software implementation of the Leacock&Chodorow (1998) measure in previous HESML versions. HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the main HESML paper [1] as attribution requirement. On other hand, the commercial use of the similarity measures introduced in [2], as well as part of the intrinsic IC models introduced in [3] and [4], is protected by a patent application [5]. In addition, any user of HESML must fulfill other licensing terms described in [1] related to other resources distributed with the library. References: [1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66, 97–118. [2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153. [3] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526. [4] Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (UNED). [5] Lastra Díaz, J. J., & García Serrano, A. (2016). System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model. USPTO App, US2016/0179945 A1.

HESML V1R4乃半边语义度量库（HESML）的第四版发布，如[1]所述。该库是一种新颖的、线性可扩展且高效的Java软件库，基于本体论构建语义相似度度量以及基于WordNet的信息含量（IC）模型。 HESML V1R4实现了文献中报道的多数基于本体论的语义相似度度量以及基于WordNet的信息含量模型，并评估了三种预训练词嵌入模型。此外，它还提供了基于XML的输入文件格式，以便指定在WordNet基础上进行可重复实验的执行，即使无需软件编码。 HESML V1R4引入了以下创新之处：（1）对三种预训练词嵌入文件格式的软件实现，支持文献中报道的大多数最先进的模型；（2）对Cai等人在2017年提出的内禀IC模型及两种基于IC的语义相似度度量的软件实现；（3）对Wu&Palmer（1994）度量方法的快速近似实现，该方法在文献中广泛使用；（4）集成了一组庞大的词相似度基准；最后，（5）纠正了先前HESML版本中Leacock&Chodorow（1998）度量方法软件实现中的错误。 HESML库可根据CC By-NC-SA-4.0许可免费分发，用于任何非商业目的，但需引用主要HESML论文[1]作为归属要求。另一方面，文献[2]中引入的相似度度量方法，以及文献[3]和[4]中引入的部分内禀IC模型，受专利申请[5]保护。此外，任何HESML用户都必须满足[1]中描述的与库一起分发的其他资源的许可条款。参考文献： [1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66, 97–118. [2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153. [3] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526. [4] Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (UNED). [5] Lastra Díaz, J. J., & García Serrano, A. (2016). System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model. USPTO App, US2016/0179945 A1.

提供机构：

doi.org

5,000+

优质数据集

54 个

任务类型

进入经典数据集