MS-BioGraphs
收藏arXiv2023-08-31 更新2024-06-21 收录
下载链接:
https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs
下载链接
链接失效反馈官方服务:
资源简介:
MS-BioGraphs是由英国女王大学创建的一套大型序列相似性图数据集,包含高达2.5万亿条边,是目前公开的最大规模的图数据集。该数据集通过优化的高性能计算过程生成,用于支持生物信息学中的序列聚类、伪基因功能预测等研究。数据集的创建过程涉及多步骤的数据结构和算法优化,以及WebGraph框架的并行压缩技术。MS-BioGraphs不仅用于高性能图处理的基准测试,还广泛应用于生物学领域,如基因转移预测和蛋白质序列分析等。
MS-BioGraphs is a large-scale sequence similarity graph dataset created by Queen's University Belfast, which boasts up to 2.5 trillion edges and stands as the largest publicly available graph dataset to date. Generated through optimized high-performance computing processes, this dataset is developed to support bioinformatics research such as sequence clustering and pseudogene function prediction. Its creation involves multi-step data structure and algorithm optimizations, as well as the parallel compression technology of the WebGraph framework. Beyond being used as a benchmark for high-performance graph processing, MS-BioGraphs is also widely applied in biological research fields, including gene transfer prediction and protein sequence analysis.
提供机构:
英国女王大学
创建时间:
2023-08-31



