five

Experimental data for computing semantic similarity between concepts using multiple Inheritances in Wikipedia category graph

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/hnmb43sj5s
下载链接
链接失效反馈
官方服务:
资源简介:
In this data article, we provide experimental data to compute the semantic similarity between the concepts (words) taken from the gold standard word similarity benchmarks MC30 (English), RG65 (Spanish), and RG65 (French). This data is related to the multiple inheritance-based semantic similarity methods proposed in In M. J. Hussain, et al. The dataset contains four folders named as "Benchmarks_results_graphs", "French_RG65", "MC30", and "Spanish_RG65" respectively. The folder "Benchmarks_results_graphs" contains the Pearson correlation values of the experimental results of English (MC30), French (RG65), and Spanish (RG65) benchmarks. The Folders “French_RG65”, “MC30”, and “Spanish_RG65” have all the necessary pre-processed data files to execute the python based program to compute the semantic similarity between French, English, and Spanish Wikipedia concepts according to our methods. For example, the folder “French_RG65” contains: (1) the experiments on RG65 (French) benchmark in the sub-folder named as “French_RG65_results”, (2) the required data for the computation of Information Content (IC) with respect to category hyponyms and category pages in the sub-folder names as “predate_fr”, (3) the disambiguated French Wikipedia concepts in the file named as “disambiguated_benchmark.csv”, (4) the French Wikipedia concepts page ids in the file named as “fr_RG65_pageid.csv”, (5) the French Wikipedia page associated categories in the file named as “fr_RG65_page_categories.txt”, (6) the source code to compute the semantic similarity between the concepts of French Wikipedia using IC with respect to category hyponyms in the file named as “RG_French_Sim_IC_hypos.txt”, (7) the source code to compute the semantic similarity between the concepts of French Wikipedia using IC with respect to category pages in the file named as “RG_French_Sim_IC_pages.txt.”, and (8) the source code to reproduce the data associated to Table 3 in the file named as “Table3_French.txt”. These data folders provide all the necessary pre-processed data files to execute the python-based program to reproduce the experimental results of our semantic similarity methods and further analysis on the graphical structures of different Wikipedia category graphs.
创建时间:
2020-02-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作