adibvafa/CodonTransformer
收藏Hugging Face2024-09-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/adibvafa/CodonTransformer
下载链接
链接失效反馈官方服务:
资源简介:
CodonTransformer数据集是一个包含1,001,197对DNA和蛋白质序列的综合性数据集,涵盖了164种生物,包括真核生物、细菌和古菌。数据集来源于NCBI资源,经过严格的质量控制,确保DNA序列长度可被3整除,以起始密码子开始,以单个终止密码子结束。该数据集适用于比较基因组学、密码子使用分析、蛋白质表达优化、合成生物学和生物信息学中的机器学习模型等研究领域。
The CodonTransformer dataset is a comprehensive compilation of 1,001,197 DNA and protein sequence pairs, sourced from 164 organisms across Eukaryotes, Bacteria, and Archaea. This dataset provides a rich resource for various computational biology and bioinformatics applications such as studying gene sequences, codon usage, and protein expression across diverse species. The dataset contains DNA sequences, corresponding protein sequences, and gene and organism information. It is valuable for comparative genomics, codon usage analysis, protein expression optimization, synthetic biology and genetic engineering, and machine learning models in bioinformatics.
提供机构:
adibvafa



