five

Codon Usage of Secondary Metabolite Genes in Desert Soils and Marine Sponges

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://figshare.com/articles/dataset/Codon_Usage_of_Secondary_Metabolite_Genes_in_Desert_Soils_and_Marine_Sponges/862969/4
下载链接
链接失效反馈
官方服务:
资源简介:
This file set contains the initial analysis of codon usage on secondary metabolite genes in desert soils (from Reddy et al 2012), marine sponge microbiom (from Trindade-Silva et al 2012), and desert soil from New Mexico (Owen, 2013). Why Does this Exist: I was wondering what were the similarities and differences in how microbial communities use secondary metabolite genes (genes that produce molecules that are sometimes useful to human as antibiotics and as other things). My PhD work is focusing on looking at these genes across several different environments in New Mexico (caves, sides of cliffs, and springs). I thought that there might be a connection between environmental conditions and the types and use of these genes. In this data set I am looking at single domains of larger gene clusters. Methods: The Reddy et al dataset is available as a sra and fastq file under the identifier SRR342214. The marine sponge KS domains are available under the accession numbers: JX012425:JX012657. The New Mexico desert soil data set is available from: eSNaPD http://esnapd2.rockefeller.edu/. The SRR342214 data set was mined with a custom python script to pull out all the barcoded KS domains and then remove domains with a size less than 150 nts. The New Mexico data set had all duplicate sequences removed. CodonW was used to calculate all condon bias indices and dinucleotide frequencies. The visualization of the data set was done in R studio using: ggplot2, gridExtrac, and FactoMineR. Questions Raised: Does this single domain reflect the gene cluster? Tentatively I say yes based on some whole gene cluster analysis I did in codonw. How much is random GC mutation as opposed to other pressures acting on the NP genes in the community? A quick look the Nc versus GC3 shows most points falling away from the normal distribution. This shows something other then GC mutation influencing the domain. Are there quantifiable difference between and within communities in their codon usage of the KS domains? With this small dataset we can see a difference between marine and desert soils. Will this hold up for my larger dataset? Does this tell us something useful about how the communities share their genes within and what pressure select for certain types of codon bias in the NP genes? How are codons used across the different domains (AD, KS, and PKSa) within and across bacterial communities? References: Codonw, John Peden, Oxford University, available at http://bioweb.pasteur.fr/seqanal/interfaces/codonw.html Natural product biosynthetic gene diversity in geographically distinct soil microbiomes. Appl Environ Microbiol. 2012 May;78(10):3744-52. doi: 10.1128/AEM.00102-12. Epub 2012 Mar 16. Reddy BV, Kallifidas D, Kim JH, Charlop-Powers Z, Feng Z, Brady SF. Polyketide synthase gene diversity within the microbiome of the sponge Arenosclera brasiliensis, endemic to the Southern Atlantic Ocean. Appl Environ Microbiol. 2013 Mar;79(5):1598-605. doi: 10.1128/AEM.03354-12. Epub. 2012 Dec 28. Trindade-Silva AE, Rua CP, Andrade BG, Vicente AC, Silva GG, Berlinck RG, Thompson FL. Mapping gene clusters within arrayed metagenomic libraries to expand the structural diversity of biomedically relevant natural products Jeremy G. Owen, Boojala Vijay B. Reddy, Melinda A. Ternei, Zachary Charlop-Powers, Paula Y. Calle,Jeffrey H. Kim, and Sean F. Brady, PNAS 2013 ; published ahead of print July 3, 2013 R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009. Francois Husson, Julie Josse, Sebastien Le and Jeremy Mazet (2013). FactoMineR: Multivariate Exploratory Data Analysis and Data Mining with R. R package version 1.25. http://CRAN.R-project.org/package=FactoMineR

本数据集包含对三类样本中次级代谢产物基因的密码子使用模式的初步分析,样本分别来自沙漠土壤(源自Reddy等人2012年的研究)、海洋海绵微生物组(源自Trindade-Silva等人2012年的研究)以及新墨西哥州沙漠土壤(Owen,2013年)。 数据集构建初衷:我曾好奇微生物群落利用次级代谢产物基因(即编码有时可作为抗生素或其他人类有用分子的基因)的异同。我的博士研究聚焦于分析新墨西哥州多个不同环境(洞穴、悬崖侧壁与泉眼)中的这类基因,推测环境条件与这类基因的类型及使用模式之间可能存在关联。本数据集针对大型基因簇中的单个结构域展开分析。 分析方法:Reddy等人的数据集可通过标识符SRR342214获取对应的序列读取存档(SRA)文件与FASTQ文件。海洋海绵KS结构域(KS domains)的登录号为JX012425:JX012657。新墨西哥州沙漠土壤数据集可从eSNaPD网站http://esnapd2.rockefeller.edu/获取。针对SRR342214数据集,我们使用自定义Python脚本提取所有带条形码标记的KS结构域,并移除长度小于150 nt的序列。新墨西哥州数据集则预先移除了所有重复序列。使用CodonW计算所有密码子偏好性指数与二核苷酸频率。本数据集的可视化工作在R Studio中完成,使用了ggplot2、gridExtra与FactoMineR工具包。 提出的科学问题:1. 该单个结构域能否反映其所在的基因簇?基于此前使用CodonW完成的全基因簇分析,我暂时认为答案是肯定的。2. 相较于作用于群落中天然产物(natural product, NP)基因的其他选择压力,随机GC突变的影响占比如何?通过对有效密码子数(Nc)与GC3含量的初步分析可见,多数数据点偏离正态分布,这表明除GC突变外,还有其他因素影响该结构域。3. 不同微生物群落之间以及群落内部,其KS结构域的密码子使用模式是否存在可量化的差异?凭借本小型数据集,我们已能观察到海洋与沙漠土壤样本之间的差异。该结论是否能在更大规模的研究数据集上得到验证?4. 本数据集能否为理解微生物群落内部及之间的基因共享模式,以及哪些选择压力塑造了天然产物基因的特定密码子偏好性提供有效信息?5. 不同细菌群落内部及之间的各类结构域(AD、KS与PKSa)的密码子使用情况存在何种差异? 参考文献: 1. CodonW:作者John Peden,牛津大学出品,公开获取地址:http://bioweb.pasteur.fr/seqanal/interfaces/codonw.html 2. Reddy BV, Kallifidas D, Kim JH, Charlop-Powers Z, Feng Z, Brady SF. 地理分布迥异的土壤微生物组中的天然产物生物合成基因多样性. 应用与环境微生物学, 2012, 78(10):3744-3752. DOI: 10.1128/AEM.00102-12, 2012年3月16日在线优先出版 3. Trindade-Silva AE, Rua CP, Andrade BG, Vicente AC, Silva GG, Berlinck RG, Thompson FL. 南大西洋特有海绵Arenosclera brasiliensis微生物组中的聚酮合酶基因多样性. 应用与环境微生物学, 2013, 79(5):1598-1605. DOI: 10.1128/AEM.03354-12, 2012年12月28日在线优先出版 4. 通过排列宏基因组文库绘制基因簇以拓展具有生物医学价值的天然产物结构多样性. Jeremy G. Owen, Boojala Vijay B. Reddy, Melinda A. Ternei, Zachary Charlop-Powers, Paula Y. Calle, Jeffrey H. Kim, Sean F. Brady. 美国国家科学院院刊(PNAS), 2013, 2013年7月3日提前在线出版 5. R核心开发团队(2013). R:统计计算的语言与环境. 奥地利维也纳:R统计计算基金会. 官方网址:http://www.R-project.org/ 6. Wickham H. ggplot2:数据可视化的优雅图形. 纽约:斯普林格出版社, 2009 7. Husson F, Josse J, Le S, Mazet J. FactoMineR:基于R的多变量探索性数据分析与数据挖掘. R包版本1.25, 2013. 官方地址:http://CRAN.R-project.org/package=FactoMineR
创建时间:
2024-01-31
二维码
社区交流群
二维码
科研交流群
商业服务