five

OTUs Table and fastq sequences from environmental DNA applied to trematode communities

收藏
Mendeley Data2024-04-13 更新2024-06-29 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.f7m0cfxxz
下载链接
链接失效反馈
官方服务:
资源简介:
A total of 50 individual metabarcoding libraries were prepared following the Illumina two-step PCR protocol. These 50 libraries include triplicates of each water-sediment interface sample collected (i.e., 3 x 5 = 15), four negative controls (one per sampling site) and six positive controls, the PCRs of each of these samples being duplicated (i.e., 25 x 2 = 50). The positive controls consisted into two categories of mock communities. The first category of mock community consisted in equimolar pools of 28 DNA extracts (set at a 3.5×10-3ng/µL final concentration) from different trematode species from internal collections. The second mock community consisted in equimolar pools of PCR products independently obtained from the same 28 trematodes species. For the OTUs table, The resulting amplicon sequence dataset was processed using the Find Rapidly OTUs with Galaxy Solution (FROGS) pipeline implemented in Galaxy (Escudié et al., 2018) available from the Genotoul platform (Toulouse, France). (i) The amplicon dataset was first pre-processed by filtering out the sequences so as to keep amplicon sizes from 150 to 400 nucleotides. (ii) The sequences kept were next clustered into operational taxonomic units (OTUs) using the swarm algorithm and using denoising and an aggregation distance of three nucleotides (Mahé et al., 2014). (iii) The dataset was filtered out for chimeras using VSEARCH (Rognes et al., 2016). (iv) Singletons and underrepresented clusters (i.e., clusters whose number of sequences were <0.1% of the total number of sequences) were removed. Each OTU was next assigned to a species through a two-step BLAST affiliation procedure. The first BLAST analysis was computed using the standalone blastn program contained in the BLAST+ package and a custom trematode sequence database containing a total of 88 sequences including the sequences obtained from the amplicons generated by the in silico ecoPCR (i.e., 50 species; see Section 2.1 of the article; Table S1), the sequences generated by the in vitro Sanger sequencing (i.e., 26 species over the 34 species sequenced; see Section 2.2; Table S3), and 12 sequences retrieved from the GenBank database (Table S4). The second BLAST analysis was performed using the online MEGABLAST tool without restricting parameters to achieve affiliation of OTUs that could not be assigned in the first BLAST analysis. The obtained OTUs were filtered for presenting minimal blast coverage of 97% and a pairwise identity above 97% with the affiliated sequence. The remaining OTUs were considered as “unassigned”. Lastly, we considered that a given OTU was present in a sub sample (i.e., one of the three replicates of a single environmental sample; see section 2.4) if its number of sequences was >0.1% of the total number of sequences in each of the two library assigned to this sub sample and if this OTU was present in both libraries (i.e., the two PCR replicates performed on the single subsample; see section 2.4). This 0.1% threshold was determined as being the most stringent while allowing the retention of the necessary sequences to detect all the 28 species from the control mock communities.
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作