Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6510086
下载链接
链接失效反馈官方服务:
资源简介:
This version is used in the Bio-ML track of the OAEI 2024; the only change compared to the OAEI 2023 is the deletion of certain training subsumption mappings.
Overview
The purpose of these datasets is to support equivalence and subsumption ontology matching.
There are five ontology pairs extracted from MONDO and UMLS:
Source
Task
Category
#SrcCls
#TgtCls
#Ref (equiv)
#Ref (subs)
Mondo
OMIM-ORDO
Disease
9,648
9,275
3,721
103
Mondo
NCIT-DOID
Disease
15,762
8,465
4,686
3,338 (-1)
UMLS
SNOMED-FMA
Body
34,418
88,955
7,256
5,453 (-53)
UMLS
SNOMED-NCIT
Pharm
29,500
22,136
5,803
4,224 (-1)
UMLS
SNOMED-NCIT
Neoplas
22,971
20,247
3,804
213
The "-" numbers reflect the changes due to lthe deletion of certain training subsumption mappings.
The main track is available at "bio-ml", where each pair is associated with a task folder, containing the source and target ontologies, reference equivalence mappings (in "refs_equiv"), reference subsumption mappings ("refs_subs").
The special sub-track is available at "bio-llm", where each pair is associated with a task folder, containing the source and target ontologies, and the test candidate mappings.
Citation
Bio-ML (Main Track)
```@inproceedings{he2022machine,
title={Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching},
author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian},
booktitle={International Semantic Web Conference},
pages={575--591},
year={2022},
organization={Springer}
}```
Bio-LLM (Sub-track)
```@article{he2023exploring,
title={Exploring large language models for ontology alignment},
author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian},
journal={arXiv preprint arXiv:2309.07172},
year={2023}
}```
Important Links
See detailed documentation at: https://krr-oxford.github.io/DeepOnto/bio-ml.
See the OAEI Bio-ML track at: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/
See our resource paper for the original Bio-ML at arxiv or springer (accepted at ISWC-2022 and nominated as the best resource paper candidate). See our poster paper for the Bio-LLM sub-track at arxiv (accepted at ISWC-2023 Posters & Demos).
Changelog
The only change in this version compared to the OAEI 2023 is the deletion of certain training subsumption mappings that can be directly exploited through deductive reasoning.
创建时间:
2024-07-28



