five

SDM-Genomic-Datasets

收藏
figshare.com2022-12-21 更新2025-03-26 收录
下载链接:
https://figshare.com/articles/dataset/SDM-Genomic-Datasets/14838342/1
下载链接
链接失效反馈
官方服务:
资源简介:
These datasets are generated from cosmic mutation dataset in COSMIC database (GRCh37, version90) with the purpose of evaluating available ontology-based Data Integration engines.They include datasets with different number of records (10k, 100k, 1 million, and 10 million records), attributes (2-15), and duplicated values (25-75 percent of duplicated records and each duplicated value being repeated 10/20 times). The details of generation of these datasets can be found in the papers where they have been used in empirical evaluation: https://doi.org/10.1145/3340531.3412881 and 10.5281/zenodo.3993657Also, the examples of mapping rules to integrate these datasets are available in https://github.com/SDM-TIB/SDM-RDFizer-Experiments/tree/master/cikm2020/experiments/mappings

本数据集源自COSMIC数据库中的宇宙突变数据集(GRCh37版本90),旨在评估基于本体论的数据集成引擎的可用性。该数据集包含不同记录数量(10k、100k、100万和1000万条记录)、属性(2-15个)以及重复值(25%-75%的记录重复,每个重复值重复10/20次)。关于这些数据集生成的详细信息,可参考其在实证评估中所使用的论文:https://doi.org/10.1145/3340531.3412881 和 10.5281/zenodo.3993657。此外,映射规则的示例,用于集成这些数据集,可在以下链接找到:https://github.com/SDM-TIB/SDM-RDFizer-Experiments/tree/master/cikm2020/experiments/mappings
提供机构:
figshare.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作