Supplementary Data 1
收藏DataCite Commons2021-11-10 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Supplementary_Data_1/16978561
下载链接
链接失效反馈官方服务:
资源简介:
Supplementary Data 1 contains information on genes found within Cld genomic neighborhoods, defined as genes within +/- 10 genes of Cld. This data, with protein sequences, should enable the reproduction of results reported in this paper. Key information includes the accessions and coordinates of genes, annotation of the gene, the taxonomic assignment of genomes or scaffolds, clustering of proteins into groups, and clustering of neighborhoods into groups. Additional data on groups of proteins is found in Supplementary Data 2.<br>gene_id: Gene accession, NCBI or JGI IMGscaffold_id: Scaffold/contig accession, NCBI or JGI IMGgenome_id: Genome accession, NCBI or JGI IMGsource: Source of genome, either NCBI, JGI genomes, JGI metagenomes, or unpublished collectiongenome_name: Taxonomic name of genomegene_product_name: Description of gene productlocus_tag: Locus tagstart_coord: Start coordinate of gene on contig/scaffoldend_coord: End coordinate of gene on contig/scaffoldstrand: Forward or reverse stranddomain: Domain assignment of genome or metagenomic contigphylum: Phylum assignment of genome or metagenomic contigclass: Class assignment of genome or metagenomic contigorder: Order assignment of genome or metagenomic contigfamily: Family assignment of genome or metagenomic contiggenus: Genus assignment of genome of metagenomic contigspecies: Species assignment of genome or metagenomic contigstrain: Strain assignment of genome or metagenomic contigrelative_start_coord: Start coord distance to closest Cld start coordrelative_end_coord: End coord distance to closest Cld start coordrelative_strand: Whether same or different strand to closest Cld (+ = same)relative_position: Distance in number of genes to closest Cldexported: SignalP prediction of whether exported (True) or not (False)export_signal: SignalP prediction of export signal typeprotein_length_aa: Length of translated protein sequenceheme_binding_sites: Number of CXXCH or CXXXCH motifs in protein sequencesubfamily: Assignment of gene to a cluster or “subfamily” of related proteinssubfamily_mmseqs_id: MMseqs output used to set subfamilies.multigenome_gene_id: If NCBI protein is identical across genomesidentical_protein_group: Assignment of any group of 100% identical proteinsidentical_protein: Whether or not a protein is identical to another (True) or not (False)neighborhood_group: Assignment of a contig/scaffold to a cluster or “group” with similar compositionneighborhood_tsne_dim1: t-SNE dimension 1 used to cluster neighborhood groupsneighborhood_tsne_dim2: t-SNE dimension 2 used to cluster neighborhood groupsimgm_scaffold_oid: ID to allow easier query of IMG metagenome scaffoldsimgm_gene_oid: ID to allow easier query of IMG metagenome genesscaffold_length_bp: Length of IMG metagenome scaffoldsgc_content: GC content of IMG metagenome scaffoldsread_depth: Read depth of IMG metagenome scaffoldslineage_percentage: Percent of IMG metagenome genes assigned to lineage for domain, etc.ecosystem: IMG metagenome descriptionecosystem_category: IMG metagenome descriptionecosystem_type: IMG metagenome descriptionecosystem_subtype: IMG metagenome descriptionspecific_ecosystem: IMG metagenome descriptionassigned_environment_type: Revised metagenome description (e.g. fixing aquifer sediment assigned as soil)assigned_environment_subtype: Revised metagenome description (e.g. fixing aquifer sediment assigned as soil)
提供机构:
figshare
创建时间:
2021-11-10



