HESML V1R4 Java software library of ontology-based semantic similarity measures and information content models
收藏doi.org2025-03-21 收录
下载链接:
http://doi.org/10.17632/t87s78dg78.4
下载链接
链接失效反馈官方服务:
资源简介:
HESML V1R4 is the fourth release of the Half-Edge Semantic Measures Library (HESML) detailed in [1], which is a new, linerarly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based on WordNet.
HESML V1R4 implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature, as well as the evaluation of three pre-trained word embedding models. It also provides a XML-based input file format in order to specify the execution of reproducible experiments on WordNet-based similarity, even with no software coding.
HESML V1R4 introduces the following novelties: (1) a software implementation for the evaluation of three pre-trained word embedding file formats which support most of state-of--the-art models reported in the literature; (2) a software implementation of an intrinsic IC model and two new IC-based semantic similarity measures introduced by Cai et al. (2017); (3) a software implementation of a fast approximation of the Wu&Palmer (1994) measure commonly used in the literature; (4) the integration of a very large set of word similarity benchmarks; and finally (5), the correction of an error in our software implementation of the Leacock&Chodorow (1998) measure in previous HESML versions.
HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the main HESML paper [1] as attribution requirement. On other hand, the commercial use of the similarity measures introduced in [2], as well as part of the intrinsic IC models introduced in [3] and [4], is protected by a patent application [5]. In addition, any user of HESML must fulfill other licensing terms described in [1] related to other resources distributed with the library.
References:
[1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66, 97–118.
[2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153.
[3] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526.
[4] Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (UNED).
[5] Lastra Díaz, J. J., & García Serrano, A. (2016). System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model. USPTO App, US2016/0179945 A1.
HESML V1R4乃半边语义度量库(HESML)的第四版发布,如[1]所述。该库是一种新颖的、线性可扩展且高效的Java软件库,基于本体论构建语义相似度度量以及基于WordNet的信息含量(IC)模型。
HESML V1R4实现了文献中报道的多数基于本体论的语义相似度度量以及基于WordNet的信息含量模型,并评估了三种预训练词嵌入模型。此外,它还提供了基于XML的输入文件格式,以便指定在WordNet基础上进行可重复实验的执行,即使无需软件编码。
HESML V1R4引入了以下创新之处:(1)对三种预训练词嵌入文件格式的软件实现,支持文献中报道的大多数最先进的模型;(2)对Cai等人在2017年提出的内禀IC模型及两种基于IC的语义相似度度量的软件实现;(3)对Wu&Palmer(1994)度量方法的快速近似实现,该方法在文献中广泛使用;(4)集成了一组庞大的词相似度基准;最后,(5)纠正了先前HESML版本中Leacock&Chodorow(1998)度量方法软件实现中的错误。
HESML库可根据CC By-NC-SA-4.0许可免费分发,用于任何非商业目的,但需引用主要HESML论文[1]作为归属要求。另一方面,文献[2]中引入的相似度度量方法,以及文献[3]和[4]中引入的部分内禀IC模型,受专利申请[5]保护。此外,任何HESML用户都必须满足[1]中描述的与库一起分发的其他资源的许可条款。
参考文献:
[1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66, 97–118.
[2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153.
[3] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526.
[4] Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (UNED).
[5] Lastra Díaz, J. J., & García Serrano, A. (2016). System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model. USPTO App, US2016/0179945 A1.
提供机构:
doi.org



