DATA FROM: Discovery of Thermophilic Bacillales using Reduced-Representation Genotyping for Identification.

Name: DATA FROM: Discovery of Thermophilic Bacillales using Reduced-Representation Genotyping for Identification.
Creator: figshare
Published: 2022-02-02 09:52:14
License: 暂无描述

DataCite Commons2022-02-02 更新2024-07-28 收录

下载链接：

https://figshare.com/articles/dataset/DATA_FROM_Discovery_of_Thermophilic_Bacillales_using_Reduced-Representation_Genotyping_for_Identification_/11930892/1

下载链接

链接失效反馈

官方服务：

资源简介：

AbstractThis data set contains complexity-reduced genotyping short-read sequences (length 30-69 bp) of 99 bacterial isolates obtained from samples collected between 2015 and 2016 from domestic hot water systems in the Australian Capital Territory (ACT), commercial composts available in the ACT and artesian water bores located in the Birdsville track in South Australia. These bacterial isolates were identified to belong to the genera of Anoxybacillus, Geobacillus or Brevibacillus. The data is shown as filtered fastA files. The research used two pairs of restriction enzymes PstI with HpaII and PstI with MseI. Keywordsbacterial identification; DArTseq; genotyping-by-sequencing; Great Artesian Basin; reduced-representation sequencing; thermophiles Specifications table:Subject area: Environmental MicrobiologyType of data: Short-read sequencesHow data was acquired: Illumina HiSeq2500 sequencerData format: Filtered fastA filesExperimental factors: Complexity-reduced genotyping by sequencing using three complexity reduction methods on bacterial isolatesExperimental features: Combinations of enzymes of PstI with MseI and PstI with HpaII were used.Data source location: Australia. 1. Data:These short-read fastA files are complexity-reduced genotyping data obtained from thermophilic bacterial isolates identified in samples from domestic hot water systems in the Australian Capital Territory (ACT), commercial compost available in the ACT and artesian bores in the Great Artesian Basin in South Australia. Samples were inoculated in LB agar plates and incubated at 62.2 degree Celsius, followed by DNA extraction with the method of chloroform-isoamyl alcohol solution (24:1), washed with Ethanol (70%) and dissolved in 10 mM Tris-Cl, pH 8.5. This was followed by digestion with selected pairs of restriction enzymes (PstI with MseI and PstI with HpaII), PCR amplification and sequencing using an Illumina HiSeq2500 sequencer. The number of reads for these sequences was approximately 150,000. 2. Experimental design, materials and methods.2.1 Bacterial isolatesA total of 99 bacterial isolates were obtained from 27 different sampling sources. DNA extractions were performed in all bacterial isolates using the chloroform-isoamyl alcohol method as described by Talamantes-Becerra et al., 2019. 2.2 Library preparation and sequencing The library preparation followed the DArTseq™ (Canberra, Australia) DNA digestion method that uses restriction enzymes. The pairs of enzymes used were PstI (5'-CTGCA|G-3') with MseI (5'-TTA|A-3') and PstI (5'-CTGCA|G-3') with HpaII (5'-CCG|G-3'). Clustering and sequencing were done according to the Illumina (San Diego CA, US) protocols for the HiSeq Cluster Kit V4 recipe v9.0, HiSeq SR Flow Cell v4 and sequenced in a HiSeq 2500 for a total of 77 cycles. The reads obtained per sample were approximately 150,000. 2.3 Production of data filesThe raw data obtained as fastQ files was processed using the DArTseq™ primary data processing pipeline for demultiplexing. The filtering uses stringency values of Phred score of 30, pass percentage 75 for the barcode and a minimum of Phred score of 10, pass percentage 50 for the whole-read as described by Georges et al., 2018. Barcode adapters were trimmed and fragments of less than 29 bp were removed. The final length of unique fragments was between 30 to 69 bp. Acknowledgements:We would like to thank Dr M.A.(Rien) Habermehl for sharing his expertise and advice during the selection of sampling sites and indicating the best season for collecting samples in the Great Artesian Basin. We would like to thank D. Shrestha for collecting and sending back to us, mud and water samples from Birdsville and Stoney crossing artesian water bores. We would like to thank the station managers from the Birdsville Track for allowing us to collect samples from their artesian water bores. The author BTB, wishes to acknowledge Consejo Nacional de Ciencia y Tecnología (CONACYT) for providing a scholarship “Becas CONACYT al extranjero 2015” to pursue postgraduate studies. Genome sequencing was provided by MicrobesNG (http://www.microbesng.uk), which is supported by the BBSRC (grant number BB/L024209/1). We thank Distinguished Prof. Arthur Georges, Dr Andrzej Kilian for their valuable contributions. We also thank Dr Michelle Gahan and Prof. Dennis McNevin for their suggestions on project development and methods and for co-supervising the PhD project from which this work arises.

摘要本数据集包含99株细菌分离株的复杂度降低型基因分型短读长序列（short-read sequences，长度30~69 bp），这些分离株采集自2015至2016年间的三类样本：澳大利亚首都领地（Australian Capital Territory，ACT）的家用热水系统、ACT境内售卖的商业堆肥，以及南澳大利亚州Birdsville沿线的自流水井。经鉴定，这些细菌分离株分属于厌氧芽孢杆菌属（Anoxybacillus）、地芽孢杆菌属（Geobacillus）或短芽孢杆菌属（Brevibacillus）。数据集以过滤后的FASTA格式文件（FASTA）形式提供。本研究使用了两对限制性内切酶组合：PstI与HpaII，以及PstI与MseI。 关键词细菌鉴定；DArTseq（DArTseq）；测序分型（genotyping-by-sequencing）；大自流盆地（Great Artesian Basin）；简化基因组测序（reduced-representation sequencing）；嗜热菌（thermophiles） 规格参数表： 研究领域：环境微生物学 数据类型：短读长序列（short-read sequences） 数据获取方式：Illumina HiSeq2500测序仪（Illumina HiSeq2500） 数据格式：过滤后的FASTA格式文件（FASTA） 实验因素：对细菌分离株采用三种复杂度降低方法进行复杂度降低型测序分型 实验特征：使用了PstI与MseI、PstI与HpaII的酶组合 数据来源地点：澳大利亚。 1. 数据集详情：本数据集的短读长FASTA文件来自从三类样本中分离得到的嗜热细菌分离株的复杂度降低型基因分型数据，三类样本分别为：澳大利亚首都领地（ACT）的家用热水系统样本、ACT境内的商业堆肥样本，以及南澳大利亚州大自流盆地（Great Artesian Basin）内的自流水井样本。样本接种于LB琼脂平板后，于62.2℃下培养；随后采用氯仿-异戊醇溶液（体积比24:1）法提取DNA，用70%乙醇洗涤后溶于10mM Tris-Cl缓冲液（pH 8.5）。后续步骤包括使用选定的限制性内切酶组合（PstI与MseI、PstI与HpaII）进行酶切、聚合酶链式反应（PCR）扩增，以及使用Illumina HiSeq2500测序仪（Illumina HiSeq2500）进行测序。每个样本的读段（reads）数量约为150000。 2. 实验设计、材料与方法 2.1 细菌分离株本研究共从27个不同采样源获得99株细菌分离株。所有分离株的DNA提取均采用Talamantes-Becerra等人2019年报道的氯仿-异戊醇法。 2.2 文库构建与测序文库构建遵循DArTseq™（澳大利亚堪培拉）的限制性内切酶DNA酶切方法。所用酶组合为PstI（5'-CTGCA|G-3'）与MseI（5'-TTA|A-3'），以及PstI（5'-CTGCA|G-3'）与HpaII（5'-CCG|G-3'）。聚类与测序按照Illumina（美国加利福尼亚州圣地亚哥）的HiSeq Cluster Kit V4试剂方案v9.0、HiSeq SR Flow Cell v4的流程进行，在HiSeq 2500测序仪上完成总计77个循环的测序。每个样本获得的读段（reads）数量约为150000。 2.3 数据文件生成以FASTQ格式文件获取的原始数据通过DArTseq™的原始数据处理流程进行双端拆分（demultiplexing）。过滤参数遵循Georges等人2018年的报道：条形码（barcode）区域的Phred质量值（Phred score）阈值为30，通过率需≥75%；全读段的最低Phred质量值阈值为10，整体通过率需≥50%。随后修剪条形码接头序列，并移除长度小于29bp的片段。最终得到的唯一片段长度范围为30~69bp。 致谢：本研究感谢M.A.（Rien）Habermehl博士在采样点选择以及大自流盆地（Great Artesian Basin）最佳采样季节确定方面提供的专业知识与建议。感谢D. Shrestha收集并寄回来自Birdsville与Stoney crossing自流水井的泥浆与水样。感谢Birdsville Track沿线的牧场管理人员允许我们从其所属自流水井采集样本。作者BTB感谢国家科学技术委员会（Consejo Nacional de Ciencia y Tecnología，CONACYT）提供“Becas CONACYT al extranjero 2015”奖学金以支持其研究生阶段学习。基因组测序服务由MicrobesNG（http://www.microbesng.uk）提供，该机构受英国生物技术与生物科学研究理事会（BBSRC，资助号BB/L024209/1）支持。感谢Arthur Georges杰出教授与Andrzej Kilian博士为本研究提供的宝贵支持。同时感谢Michelle Gahan博士与Dennis McNevin教授对项目设计与实验方法提出的建议，以及为本研究衍生的博士学位论文提供联合指导。

提供机构：

figshare

创建时间：

2020-08-25