Additional file 2 of About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature
收藏DataCite Commons2023-04-13 更新2024-08-18 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Additional_file_2_of_About_the_dark_corners_in_the_gene_function_space_of_Escherichia_coli_remaining_without_illumination_by_scientific_literature/22599922
下载链接
链接失效反馈官方服务:
资源简介:
Additional file 2: Fig. S1. We show the total number of E. coli softcore genes’ related publications (red line relative to the left y-axis) and the total number of genes mentioned in the respective literature (blue line relative to the right y-axis) from year 1939 up to year 2021. The blue dashed vertical lines mark the expansion period for the total number of genes from year 1965 to 2009. It apparently plateaus after the year 2019. The red dashed vertical lines at years 1970 and 2007 indicate two periods of publication dynamics: 1970–2007 and 2007–2021. The ratio of the number of publications in each year to the total number of new genes identified in each year is shown in the insert. Fig. S2. FPE plots for different FPE score ranges from year 1960 until 2021 for E. coli K-12 genes are separately shown for five different categories, i.e. (A) very understudied, (B) understudied, (C) moderately studied, (D) intensively studied and (E) very intensively studied. The y-axis is given in the same scale for visual comparison across different categories. Fig. S3. We illustrate the number of new genes of E. coli K-12 achieving the FPE score ranges (T0, T1, T5, T10, T15, T20, T25, T30, T35, T40, T45, T50, T75, T100, T500) across the years in (A) phase 1 and (B) phase 2 periods. The linear regression line (number of new genes (y-axis) versus year (x-axis)) is shown. The magnitude of the slope is provided in Table 2. Fig. S4. FPE plots for different FPE score range from year 1960 until 2021 for E. coli softcore genes are separately shown for five different categories, i.e. (A) very understudied, (B) understudied, (C) moderately studied, (D) intensively studied and (E) very intensively studied. The y-axis is given in the same scale for visual comparison across different categories. Fig. S5. We illustrate the number of new genes of the E. coli softcore genome achieving the FPE score ranges (T0, T1, T5, T10, T15, T20, T25, T30, T35, T40, T45, T50, T75, T100, T500) across the years in (A) phase 1 and (B) phase 2 periods. The linear regression line (number of new genes (y-axis) versus year (x-axis)) is shown. The magnitude of the slope is provided in Additional file 1: Table S5. Fig. S6. Prediction of the transmembrane (TM) region in the protein sequence yahV (GF_29643) in E. coli K-12 MG1655 using TMHMM 2.0. The TM region is predicted to cover positions 4-23 of the protein sequence. Fig. S7. The upstream and downstream genes of yahV based on NCBI RefSeq. The betABIT operon is upstream of yahV gene. betABIT is expressed only under aerobic condition during osmotic stress for production of osmoprotectants. The pdeL gene, on the other hand, is downstream of the gene yahV. The pdeL gene appears involved in the regulation of cell motility. Fig. S8. Neighboring gene families of GF_29643 (yahV; circled in red) focusing on genomes that carry GF_29643. Ten GFs upstream and ten GFs downstream of GF_29643 are extracted and investigated. Each GF is represented as a node and two nodes are linked by an edge if they are next to each other. The thickness of the edge represents the weighted link between the two GFs. Clearly, GF_29643’s genomic position is conserved across the E. coli genomes that carry the yahV gene. Note that GF_8617 represents the betT gene and GF_25808 contains the pdeL gene. Fig. S9. The predicted transmembrane beta-barrel (TMBB) structure of protein yddL (GF_4841) using BetAware-Deep. The predicted localization is outer membrane TMBB with the overall TMBB probability of 0.93. There are four (4) TM β-strand segments as shown in the figure. Fig. S10. We illustrate the GFs associated with GF_29643, GF_4841 and GF_8394. The associated GFs of these three GFs have high overlap with each other and, therefore, can be related. Each node represents a GF and the edge (connecting line) indicates a significant coincident association between nodes (P-value ≤ 1 × 10–20). The size of the node is determined by the node’s degree (the number of associated GFs). The color of the node is represented by a gradient color from grey to red which is determined by the node’s degree as well. The three cluster-founding GFs are highlighted by red arrows. Please note that only 60 out of 68 GFs found are present in E. coli K-12 MG1655. Fig. S11. The number of overlapping associated GFs among three GFs, i.e., GF_29643, GF_8394 and GF_4841. Fig. S12. Manual annotation of associated GFs to GF_29643 (yahV), GF_4841 (yddL), and GF_8394 (paaE). There are four potential biological processes related to these 3 GFs, i.e. osmotic regulation, stress response, cell motility and energy metabolism. The corresponding genes are given for each biological process. The genes with unclear function are given as “Not Clear”. Fig. S13. The protein expression of 11 genes extracted from Caglar’s proteomics data. Only 11 genes out of 30 gene families, which are fully connected or significantly associated to each other, have the protein expression in Caglar’s proteomics data. Please note that the E. coli strain used in Caglar’s study is E. coli REL606, which belongs to phylogroup A (sequence type ST93). This is different from E. coli K-12 MG1655, which has sequence type ST10. The highlighted box (with a red dashed line) emphasizes the expression results from cultures under NaCl_Stress condition. Fig. S14. We visualize the gene expression of 19 genes extracted from the Metris et al. data in accordance with osmotic conditions. These 19 genes are from our set of 30 GFs, which are fully connected or significantly associated to each other. Please note that the E. coli strain used in Metris’ study is E. coli K12 MG1655, which is the same as the E. coli strain in our analysis.
提供机构:
figshare
创建时间:
2023-04-13



