five

Seed sequence dataset

收藏
Figshare2023-12-22 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Seed_sequence_dataset/24893910
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the primary sequences for each of the 53 conserved protein families used as seeds for our iterative, explorative homolog search procedure.The archive seed_dataset.zip contains, for each of the 53 families (labelled with a unique numerical ID):XXX.faa : amino-acid sequences of each protein in the seed familyXXX.dic : taxonomical information on the host lineage of each proteinXXX.emapper.annotations : eggnog functional annotation for the protein family sequences.The TSV file seed_dataset_info.tsv is a tab-separated, plain-text table with additional information on the protein family dataset. Data is arranged in columns as follows:FamilyID : numerical ID for each seed protein family#QuerySeqs : number of seed sequences in the family#SearchIterations : number of search iterations performed in the environmental metagenome before no new homolog is found#EnvSeqs : number of environmental homologs retrieved in total by the iterative search#EnvSeqsPerIteration : environmental homologs retrieved at each step of the iterative searchDescription : free-text description of the protein family.
创建时间:
2023-12-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作