Original protein sequence data of various bacteria for a novel machine learning-based method of bacterial shape identification
收藏DataCite Commons2025-09-01 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/Original_protein_sequence_data_of_various_bacteria_for_a_novel_machine_learning-based_method_of_bacterial_shape_identification/30022573/1
下载链接
链接失效反馈官方服务:
资源简介:
Accurately identifying genes responsible for specific functions is a cornerstone of biological research, but current methods are often limited to single-species analyses. Here, we present a novel method termed "Genomic and Phenotype-based machine learning for Gene Identification" (GPGI), which leverages large-scale cross-species genomic and phenotypic data for functional gene discovery. The uploaded dataset comprises the complete protein sequences encoded by all 3,750 bacterial species used in the early stages of this method. These sequences were obtained from NCBI RefSeq and have been deposited in accordance with the journal's requirements to facilitate access for other interested researchers.
精准鉴定与特定功能相关的基因是生物学研究的核心基石之一,但当前相关方法往往局限于单物种分析。本研究提出一种命名为“基于基因组与表型的基因功能鉴定机器学习方法(Genomic and Phenotype-based machine learning for Gene Identification,缩写为GPGI)”的新型方法,其依托大规模跨物种基因组及表型数据开展功能基因发掘工作。本次上传的数据集包含本研究早期阶段所用的全部3750种细菌所编码的完整蛋白质序列。该批序列获取自NCBI RefSeq数据库,并已按照期刊要求完成提交,以便其他感兴趣的研究人员获取使用。
提供机构:
figshare
创建时间:
2025-09-01



