five

MeSH2Wikidata: A set of tools for the interaction between MeSH keywords, OBO Foundry, and Wikidata for enriching biomedical knowledge

收藏
DataCite Commons2024-10-29 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/MeSH2Wikidata_A_set_of_tools_for_the_interaction_between_MeSH_keywords_OBO_Foundry_and_Wikidata_for_enriching_biomedical_knowledge/24438184
下载链接
链接失效反馈
官方服务:
资源简介:
The work consists of tools for the interaction between Wikidata and OBO Foundry and source codes for the use of MeSH keywords of PubMed publications for the enrichment of biomedical knowledge in Wikidata. This work is funded by the Adapting Wikidata to support clinical practice using Data Science, Semantic Web and Machine Learning Project within the framework of the Wikimedia Foundation Research Fund.<b>To cite the work</b>: Turki, H., Chebil, K., Dossou, B. F. P., Emezue, C. C., Owodunni, A. T., Hadj Taieb, M. A., &amp; Ben Aouicha, M. (2024). A framework for integrating biomedical knowledge in Wikidata with open biomedical ontologies and MeSH keywords. <i>Heliyon</i>, <i>10</i>(19), e38488. doi:10.1016/j.heliyon.2024.e38448.Wikidata-OBO<b>tool1.py:</b> A tool for the verification of the semantic alignment between Wikidata and OBO ontologies.<b>frame.py:</b> The layout of Tool 1.<b>tool2.py:</b> A tool for extracting Wikidata relations between OBO ontology items.<b>frame2.py:</b> The layout of Tool 2.<b>tool3.py:</b> A tool for extracting multilingual language data for OBO ontology items from Wikidata.<b>frame4.py:</b> The layout of Tool 3.Wikidata-MeSH<b>correct_mesh2matrix_dataset.py:</b> A source code for turning MeSH2Matrix into a smaller dataset for the biomedical relation classification based on the MeSH keywords of PubMed publications, named MiniMeSH2Matrix.<b>build_numpy_dataset.py:</b> A source code for building the numpy files for MiniMeSH2Matrix (Relation type-based classification).<b>label_encoded.csv:</b> A table for the conversion of Wikidata Property IDs into MeSH2Matrix Class IDs.<b>new_encoding.csv:</b> A table for the conversion of Wikidata Property IDs into MiniMeSH2Matrix Class IDs.<b>super_classes_new_dataset_labels.npy:</b> The NumPy File of the labels for the superclass-based classification.<b>new_dataset_labels.npy:</b> The NumPy File of the labels for the relation type-based classification.<b>new_dataset_matrices.npy:</b> The Numpy File of the MiniMeSH2Matrix matrices for biomedical relation classification.<b>first_level_new_data.json:</b> The JSON File for the conversion of relation types to superclasses.<b>build_super_classes.py:</b> A source code for building the numpy files for MiniMeSH2Matrix (Superclass-based classification).<b>FC_MeSH_Model_57_New_Data.ipynb:</b> A Jupyter Notebook for training a Dense Model to perform the relation type-based classification.<b>FC_MeSH_Model_57_New_Data_SuperClasses.ipynb:</b> A Jupyter Notebook for training a Dense Model to perform the superclass-based classification.<b>new_data_best_model_1:</b> A stored edition of the best model for the relation type-based classification.<b>new_data_super_classes_best_model_1:</b> A stored edition of the best model for the superclass-based classification.<b>MiniMeSH2Matrix_SuperClasses_Confusion_Matrix.ipynb:</b> A Jupyter Notebook for generating the confusion matrix for the superclass-based supervised classification.<b>MiniMeSH2Matrix_Supervised_Classification_Agreement.ipynb:</b> A Jupyter Notebook for generating the matrix of agreement between the accurate predictions for superclass-based classification and the ones for relation type-based classification.<b>Adding_References_to_Wikidata.ipynb:</b> A Jupyter Notebook to identify the PubMed ID of relevant references to unsupported Wikidata statements between MeSH terms.<b>MeSH_Statistics.xlsx:</b> Statistical data about MeSH-based items and relations in Wikidata.<b>ref_for_unsupported_statements.csv:</b> Retrieved Relevant PubMed References for 1k unsupported Wikidata statements.<b>evaluate_pubmed_ref_assignment.ipynb:</b> A Jupyter Notebook that generates statistics about reference assignment for a sample of 1k unsupported statements.<b>MeSH_Verification.xlsx:</b> A list of inaccurate or duplicated MeSH IDs in Wikidata, as of August 8th, 2023.<b>WikiRelationsPMI.csv:</b> A list of PMI values for the semantic relations between MeSH terms, as available in Wikidata.<b>WikiRelationsPMIDistribution.xlsx:</b> Distribution of PMI values for all Wikidata relations and for specific Wikidata relation types.<b>WikiRelationsToVerify.xlsx:</b> Wikidata relations needing attention because they involve Wikidata items with inaccurate MeSH IDs, they cannot be found in PubMed, or their PMI values are below the threshold of 2.<b>Mesh_part1.py:</b> A Python code that verifies the accuracy of the MeSH IDs for the Wikidata items.<b>MeshWikiPart.py:</b> A Python code that computes the pointwise mutual information values for Wikidata relations between MeSH keywords based on PubMed.<b>Demo.ipynb:</b> A demo of the MeSH-based biomedical relation validation and classification in French.<b>Id_Term.json:</b> A dict of Medical Subject Headings labels corresponding to MeSH Descriptor ID.<b>dict_mesh.json:</b> Number of the occurrences of MeSH keywords in PubMed.<b>finalmatrix.xlsx</b><b>:</b> Matrix of PMI values between the 5k most common MeSH Keywords.<b>finalmatrixrev.pkl</b><b>:</b> Pickle File Edition of the PMI matrix.<b>pmi2.xlsx:</b> List of significant PMI associations between the 5k most common MeSH Keywords reaching a threshold of 2.<b>Generate5kMatrix.py:</b> A Python code that generates the PMI matrix.<b>clean_pmi2.py:</b> A Python code to remove the relations already available in Wikidata from pmi.xlsx.<b>missing_rels.xlsx:</b> The final list of the significant PMI associations that do not exist in Wikidata.<b>item_category.json:</b> A dict for MeSH tree categories corresponding to MeSH items.<b>item_categorization.py:</b> A Python code that generates a dict for MeSH tree categories corresponding to MeSH items.<b>classification.py: </b>A Python code for classifying PMI-generated semantic relations between the most common MeSH Keywords.<b>results.xlsx:</b> The output of the classification of the PMI-generated semantic relations between the most common MeSH Keywords.<b>ClassificationStats.ipynb:</b> A Jupyter Notebook for generating statistical data about the classification.<br>
提供机构:
figshare
创建时间:
2023-10-30
二维码
社区交流群
二维码
科研交流群
商业服务