five

Comparison of biomedical relationship extraction methods and models for knowledge graph creation (Gene-Disease relationships)

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6466315
下载链接
链接失效反馈
官方服务:
资源简介:
This is the dataset used for classifying Gene-Disease relationship types from sentences. The dataset consists of 3 files: manually_annotated_set.xlsx - set of 2000 manualy annotated sentences with entities Unbalanced_dataset.xlsx - set of 12000 sentences, out of which 2000 are from the first set, manually annotated, and the rest have been added using rule based method by adding sentences where extraction had confidence 1. Balanced_dataset_SUB_PRED.xlsx - balanced dataset generated by taking 2000 manually annotated sentences, but then adding sentences from the rule-based method with confidence 1 in such a way that each relationship class had at least 1400 sentences (for biomarkers, we could obtain 1243 sentences with confidence 1 from a processed portion of the data we had at the time of building the dataset). Please cite: Milosevic, Nikola, and Wolfgang Thielemann. "Comparison of biomedical relationship extraction methods and models for knowledge graph creation." Journal of Web Semantics (2022): 100756. https://doi.org/10.1016/j.websem.2022.100756 Article preprint available at: https://arxiv.org/abs/2201.01647
创建时间:
2022-09-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作