Additional file 1 of Conserved and lineage-specific hypothetical proteins may have played a central role in the rise and diversification of major archaeal groups

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://figshare.com/articles/dataset/Additional_file_1_of_Conserved_and_lineage-specific_hypothetical_proteins_may_have_played_a_central_role_in_the_rise_and_diversification_of_major_archaeal_groups/20236083

下载链接

链接失效反馈

官方服务：

资源简介：

Additional file 1: Table S1. List of the 3197 genomes used in this study. For each genome (column A), its NCBI accession, GGKBASE link, number of scaffolds, genome size and number of CDS are displayed in columns B, C, D, E and F respectively. Genome source is in column G, dRep cluster in column H. Genome completeness and the contamination based on single copy genes are displayed in columns I and J respectively. Column K informs about the concatenated ribosomal proteins. The 1,179 representative genomes are indicated in column L. The phylum and superphylum (DPANN and non-DPANN) taxonomy of the representative genomes are provided in columns M and N. Taxonomy based on the different databases we pulled out the genomes is shown in column O. Table S2. Taxonomy distribution of the 113 modules. Module name is indicated in column A whereas the number of families is indicated in column B. Suggested taxonomic distribution is indicated in column C. Column D details the genomes used to define the taxonomic distribution (phylum, number of genomes). Table S3. Annotation of the 10,866 families. Column A: module number. Column B: family accession. Column C: number of proteins in the family. Column D: median length of the proteins. Column E: ratio of proteins predicted to contain a signal peptide. Column F: median number of predicted transmembrane helix per protein. Column G: domain architecture reported by Pfam. Columns H, I, J, K, L: KEGG annotations. Column M: Cazy annotation. Column N: arCOG annotation. Column O: Unifam annotation. Columns Q to AF indicate the ratio of genomes having the given family in the given archaeal phylum. Columns AG to CN indicate the ratio of genomes having the given family in the given bacterial phylum. Table S4. Annotation of the subfamilies (column C) based on Hmmsearch against the PDB database (columns D and E) and based on HMM-HMM prediction against the arCogs of the EggNOGs database (columns F, G, H and I). Table S5. Genes neighboring the four genes encoding the subunits of the ammonia monooxygenase. The four genes downstream and upstream of each amoA, amoB, amoC and amoX genes (column H) were identified and annotated using the protein clustering (column E), the PFAM (column G) and the KEGG databases (column F). Table S6. Genes neighboring the three genes encoding the subunits of the 5-oxoprolinase complex. The three genes downstream and upstream of each pxpA, pxpB and pxpC genes (column H) were identified and annotated using the protein clustering (column E), the PFAM (column G) and the KEGG databases (column F). Table S7. Genes neighboring the three genes encoding the enzymes of the pathway of the C50 carotenoid bacterioruberin and the gene encoding a distant homolog of the catalytic subunit I of heme-copper oxygen reductase (fam02696). The four genes downstream and upstream of each LyeJ, CruF, CrtD and fam02696 genes (column H) were identified and annotated using the protein clustering (column E), the PFAM (column G) and the KEGG databases (column F). Table S8. Genes neighboring the two genes encoding the integrin beta 4 and the TFIIH. The five genes downstream and upstream of each integrin and the TFIIH genes (column H) were identified and annotated using the protein clustering (column E), the PFAM (column G) and the KEGG databases (column F).

创建时间：

2022-07-05