five

OpenBioLink2020

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3826918
下载链接
链接失效反馈
官方服务:
资源简介:
The OpenBioLink2020 Dataset is a highly challenging biomedical benchmark dataset containing over 5 million positive and negative edges. The test set does not contain trivially predictable, inverse edges from the training set and does contain all different edge types, to provide a more realistic edge prediction scenario. For further information, please check out the github repository. OpenBioLink2020: directed, high quality is the default dataset that should be used for benchmarking purposes. To allow anayzing the effect of data quality as well as the directionality of the evaluation graph, four variants of OpenBioLink2020 are provided -- in directed and undirected setting, with and without quality cutoff. Additionally, each graph is available in RDF N3 format (without train-validation-test splits) and BEL. OpenBioLink is a resource and evaluation framework for evaluating link prediction models on heterogeneous biomedical graph data. It contains benchmark datasets as well as tools for creating custom benchmarks and training and evaluating models. The OpenBioLink benchmark aims to meet the following criteria: Openly available Large-scale Wide coverage of current biomedical knowledge and entity types Standardized, balanced train-test split Open-source code for benchmark dataset generation Open-source code for evaluation (independent of model) Integrating and differentiating multiple types of biological entities and relations (i.e., formalized as a heterogeneous graph) Minimized information leakage between train and test sets (e.g., avoid inclusion of trivially inferable relations in the test set) Coverage of true negative relations, where available Differentiating high-quality data from noisy, low-quality data Differentiating benchmarks for directed and undirected graphs in order to be applicable to a wide variety of link prediction methods Clearly defined release cycle with versions of the benchmark and public leaderboard Please note that the OpenBioLink benchmark files contain data derived from external ressources. Licensing terms of these external resources are detailed here.
创建时间:
2021-09-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作