HESML V1R3 Java software library of ontology-based semantic similarity measures and information content models

Mendeley Data2017-10-03 更新2026-04-09 收录

下载链接：

https://data.mendeley.com/datasets/t87s78dg78/3

下载链接

链接失效反馈

官方服务：

资源简介：

HESML V1R3 is the third release of the Half-Edge Semantic Measures Library (HESML) detailed in [1], which is a new, scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based on WordNet. HESML V1R3 implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature. It also provides a XML-based input file format in order to specify the execution of reproducible experiments on WordNet-based similarity, even with no software coding. The main features of HESML are as follows: (1) it is based on an efficient and linearly scalable representation for taxonomies called PosetHERep introduced in [1], (2) its performance exhibits a linear scalability as regards the size of the taxonomy, and (3) it does not use any caching strategy of vertex sets. HESML V1R3 introduces two minor novelties as follows: the vertex ID has been updated from Integer to Long type in order to support a larger number of vertexes, and it includes five new similarity measures introduced by Hao et al (2011), Liu et al (2007), Pekar&Staab (2002) and Stojanovic et al (2001). HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the main HESML paper [1] as attribution requirement. On other hand, the commercial use of the similarity measures introduced in [2], as well as part of the intrinsic IC models introduced in [3] and [4], is protected by a patent application [5]. In addition, any user of HESML must fulfill other licensing terms described in [1] related to other resources distributed with the library, such as WordNet and a dataset of corpus-based IC models, among others. References: [1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66, 97–118. http:dx.doi.org/10.1016/j.is.2017.02.002 [2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153. [3] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526. [4] Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (UNED). [5] Lastra Díaz, J. J., & García Serrano, A. (2016). System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model. United States Patent and Trademark Office (USPTO) Application, US2016/0179945 A1.

HESML V1R3是半边缘语义度量库（Half-Edge Semantic Measures Library，HESML）的第三个修订版本，该库详情见文献[1]，是一款基于词网（WordNet）的可扩展、高效Java软件库，用于实现基于本体的语义相似度度量与信息内容（Information Content，IC）模型。 HESML V1R3实现了文献中报道的绝大多数基于词网的本体语义相似度度量与信息内容模型。该库还提供了基于XML的输入文件格式，可用于指定基于词网的相似度可复现实验的执行流程，且无需编写任何代码。该库的核心特性如下：(1) 基于文献[1]中提出的名为PosetHERep的高效线性可扩展分类法表示方式；(2) 其性能随分类法规模呈线性可扩展；(3) 未采用任何顶点集缓存策略。 HESML V1R3引入了两处小幅改进：一是将顶点ID从整数（Integer）类型更新为长整型（Long）类型，以支持更多顶点数量；二是新增了5种由Hao等人（2011）、Liu等人（2007）、Pekar&Staab（2002）以及Stojanovic等人（2001）提出的相似度度量方法。 HESML库遵循CC By-NC-SA-4.0许可协议，可免费用于非商业用途，但需引用HESML核心文献[1]作为署名要求。另一方面，文献[2]中提出的相似度度量方法，以及文献[3]与[4]中部分固有信息内容模型的商业使用，受专利申请[5]保护。此外，所有HESML使用者还需遵守文献[1]中提及的、与该库附带分发的其他资源（如词网以及基于语料库的信息内容模型数据集等）相关的其他许可条款。参考文献： [1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML：一款集成可复现实验集与复现数据集的可扩展本体语义相似度度量库. 信息系统, 66, 97–118. http://dx.doi.org/10.1016/j.is.2017.02.002 [2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). 一类新颖的基于信息内容的相似度度量方法：基于词网的详细实验综述. 工程应用人工智能期刊, 46, 140–153. [3] Lastra-Díaz, J. J., & García-Serrano, A. (2015). 一类新型信息内容模型：基于词网的实验综述. 知识系统, 89, 509–526. [4] Lastra-Díaz, J. J., & García-Serrano, A. (2016). 成熟信息内容模型的改进：基于词网的详尽实验综述. 西班牙国立远程教育大学（Universidad Nacional de Educación a Distancia, UNED）. [5] Lastra Díaz, J. J., & García Serrano, A. (2016). 基于本体信息检索模型的语义标注数据索引与检索系统及方法. 美国专利商标局（USPTO）专利申请, US2016/0179945 A1.

创建时间：

2017-10-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集