five

Seed sequence dataset

收藏
DataCite Commons2025-04-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/Seed_sequence_dataset/24893910/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the primary sequences for each of the 53 conserved protein families used as seeds for our iterative, explorative homolog search procedure.The archive <b>seed_dataset.zip</b> contains, for each of the 53 families (labelled with a unique numerical ID):XXX.faa : amino-acid sequences of each protein in the seed familyXXX.dic : taxonomical information on the host lineage of each proteinXXX.emapper.annotations : eggnog functional annotation for the protein family sequences.The TSV file <b>seed_dataset_info.tsv</b> is a tab-separated, plain-text table with additional information on the protein family dataset. Data is arranged in columns as follows:FamilyID : numerical ID for each seed protein family#QuerySeqs : number of seed sequences in the family#SearchIterations : number of search iterations performed in the environmental metagenome before no new homolog is found#EnvSeqs : number of environmental homologs retrieved in total by the iterative search#EnvSeqsPerIteration : environmental homologs retrieved at each step of the iterative searchDescription : free-text description of the protein family.
提供机构:
figshare
创建时间:
2023-12-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作