five

These are all the scripts, FastQC reports, filtered datasets and metadata used for analysis. Please read the readme file for a detailed explanation of what each file contains.

收藏
DataCite Commons2023-05-17 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/These_are_all_the_scripts_FastQC_reports_filtered_datasets_and_metadata_used_for_analysis_Please_read_the_readme_file_for_a_detailed_explanation_of_what_each_file_contains_/22881482/1
下载链接
链接失效反馈
官方服务:
资源简介:
<strong>fastqc_files.zip</strong> - zip folder containing all the html files with the FastQC report for each sample sequenced. The library names match the ones in the Genebank Bioproject PRJNA972185 (Biosamples SAMN35067136 to SAMN35067307) <br> <strong>*.qza files</strong> (output from Qiime2) - clean-filtered-2-500-observed-taxonomy-silva.qza is a Qiime2 output file (qza) conatining the taxonomy of all the features observed in the filtered dataset; - clean-filtered-table-2-500-aligned-rep-seqs-silva.qza is a Qiime2 output file (qza) containing the alignment (not masked) representative sequences of all the features observed in the filtered dataset - clean-filtered-table-2-500-masked-aligned-rep-seqs-silva.qza is a Qiime2 output file (qza) contaning the masked alignemnet of the representative sequences of all the features observed in the filtered dataset - clean-filtered-table-2-500-rep-seqs-silva.qza is a Qiime2 output file (qza) containing the representative sequences of all the features observed in the filtered dataset - clean-filtered-table-2-500-rooted-tree-silva.qza is a Qiime2 output file (qza) containing the rooted phylogenetic tree based on the taxonomy assigned to all the features observed in the filtered dataset - clean-filtered-table-2-500-silva.qza is a Qiime2 output file (qza) containing the filtered feature table - clean-filtered-table-2-500-unrooted-tree-silva.qza is a Qiime2 output file (qza) containing the unrooted phylogenetic tree based on the taxonomy assigned to all the features observed in the filtered dataset. <br> <strong>Scripts </strong> Bioinformatic analyses were undertaken using the New Zealand eScience Infrastructure (NeSI, Linux system) and a local computer (all R analyses, Windows system). The scripts provided are the following: - FastQCandQiime2.md, FastQCandQiime2.pdf and FastQCandQiime2.html: Three file formats (md for markdown, pdf and html) of the script (annotated) used in a Linux System (NeSI) to undertake FastQC analyses and run Qiime2 up to the generation of filtered qza files that were used as input to all R analyses. - Alpha_diversity_rarefied.R Annotated R script used to load the relevant qza files (filtered output from Qiime2), subset and rarefy the data and estimate alpha diversity comparing environment, parasites and snails, as well as just the different parasite species and other combinations of samples. - barPlots_heatmaps_vennD.R Annotated R script used to load the relevant qza files (filtered output from Qiime2), subset the data and draw the bar plots of relative abundance, heatmaps and Venn Diagrams (non-rarefied data). - beta_diversity_rarefied.R Annotated R script used to load the relevant qza files (filtered output from Qiime2), subset and rarefy the data and estimate beta diversity comparing environment, parasites and snails, as well as just the different parasite species and other combinations of samples. - differential_abundance_tests.R Annotated R script used to load the relevant qza files (filtered output from Qiime2), subset the data (all samples, or only parasites, or parasite-snail host pairs, or only snails) to run the tests of differential abundance (Aldex 2, corncob and metastat methods). - indicspecies.R Annotated R script used to load the relevant qza files (filtered output from Qiime2), subset the data and run the indicator taxa tests (Indicspecies R package). - phylogeneticVsMicrobiome_distances.R Annotated R script used to calculate the genetic distances between the four trematode species, calculate the microbiome distances between these four trematode species (based on beta diversity metrics of rarefied data) and test for association between genetic distance and microbiome distance (phylosymbiosis). It includes tests of normality (Shapiro Wilk) and of correlation, as well as mantel tests. <br> <strong>Other files</strong> -run_insight.csv Output statistics of sequencing run -metadata_complete.csv All the metadata used in the different scripts. Columns are described in detail in the readme file. - partialCOI_trematodes_short.fasta Partial COI gene sequence of the four trematode species (downloaded from Genebank), used estimate genetic distance between species. - partialCOI_trematodes_short_aligned.phy Aligned sequences (partial COI gene) of the four trematode species (based on partialCOI_trematodes_short.fasta and aligned with mafft_command.txt) - 28S_parasite_families.txt Partial 28S gene sequence (in fasta file, but using extension .txt) for the four trematode families and an outgroup (downloaded from Genebank), used to estimate genetic distance between families - 28S_parasites_aligned2.phy Aligned sequences (partial 28S gene) of the four trematode species (based on 28S_parasite_families.txt and aligned with mafft_command.txt) - mafft_command.txt Command used in Lynux (NeSI system) to run MAFFT and align the COI and 28S sequences (partialCOI_trematodes_short.fasta and 28S_parasite_families.txt)

**fastqc_files.zip**:为包含所有测序样本FastQC(FastQC)分析报告HTML文件的压缩归档。其文库名称与GenBank(GenBank)生物项目PRJNA972185(生物样本Biosamples SAMN35067136至SAMN35067307)中的命名完全一致。 ***.qza 文件**:均为Qiime2(Qiime2)的输出文件(qza格式),各文件说明如下: - clean-filtered-2-500-observed-taxonomy-silva.qza:包含过滤后数据集内所有观测特征的分类学信息; - clean-filtered-table-2-500-aligned-rep-seqs-silva.qza:包含过滤后数据集内所有观测特征的未屏蔽比对代表序列; - clean-filtered-table-2-500-masked-aligned-rep-seqs-silva.qza:包含过滤后数据集内所有观测特征的代表序列屏蔽比对结果; - clean-filtered-table-2-500-rep-seqs-silva.qza:包含过滤后数据集内所有观测特征的代表序列; - clean-filtered-table-2-500-rooted-tree-silva.qza:包含基于过滤后数据集所有特征分配的分类学信息构建的有根系统发育树; - clean-filtered-table-2-500-silva.qza:包含过滤后的特征表; - clean-filtered-table-2-500-unrooted-tree-silva.qza:包含基于过滤后数据集所有特征分配的分类学信息构建的无根系统发育树。 **脚本文件**:本数据集的生物信息学分析依托新西兰科学基础设施(NeSI,Linux系统)与本地计算机完成,其中所有R语言分析均运行于Windows系统。本次提供的脚本如下: 1. FastQCandQiime2.md、FastQCandQiime2.pdf 与 FastQCandQiime2.html:分别为Markdown(md)、PDF、HTML三种格式的带注释脚本,用于在Linux系统(NeSI)中执行FastQC分析并运行Qiime2,直至生成可作为所有R分析输入的过滤后qza文件; 2. Alpha_diversity_rarefied.R:带注释的R脚本,用于加载Qiime2过滤输出的相关qza文件,对数据进行子集化与稀疏化处理,估算α多样性(alpha diversity),并比较环境、寄生虫与蜗牛样本,以及不同寄生虫物种与其他样本组合间的差异; 3. barPlots_heatmaps_vennD.R:带注释的R脚本,用于加载Qiime2过滤输出的相关qza文件,对数据进行子集化处理,绘制相对丰度柱状图、热图与韦恩图(Venn Diagram,非稀疏化数据); 4. beta_diversity_rarefied.R:带注释的R脚本,用于加载Qiime2过滤输出的相关qza文件,对数据进行子集化与稀疏化处理,估算β多样性(beta diversity),并比较环境、寄生虫与蜗牛样本,以及不同寄生虫物种与其他样本组合间的差异; 5. differential_abundance_tests.R:带注释的R脚本,用于加载Qiime2过滤输出的相关qza文件,对数据进行子集化(全样本、仅寄生虫样本、寄生虫-蜗牛宿主配对样本或仅蜗牛样本),以执行差异丰度检验(采用Aldex 2、corncob与metastat三种方法); 6. indicspecies.R:带注释的R脚本,用于加载Qiime2过滤输出的相关qza文件,对数据进行子集化处理,并运行指示分类群检验(基于Indicspecies R包); 7. phylogeneticVsMicrobiome_distances.R:带注释的R脚本,用于计算四种吸虫物种间的遗传距离,基于稀疏化数据的β多样性指标估算这四种吸虫物种间的微生物组距离,并检验遗传距离与微生物组距离间的关联(系统共生现象)。该脚本包含正态性检验(Shapiro Wilk检验)、相关性检验以及Mantel检验(Mantel test)。 **其他文件**: 1. run_insight.csv:测序运行输出统计数据; 2. metadata_complete.csv:所有脚本中使用的元数据,各列的详细说明见readme文件; 3. partialCOI_trematodes_short.fasta:四种吸虫物种的部分COI基因序列(从GenBank下载),用于估算物种间的遗传距离; 4. partialCOI_trematodes_short_aligned.phy:基于partialCOI_trematodes_short.fasta、经mafft_command.txt中命令比对得到的四种吸虫物种的部分COI基因比对序列; 5. 28S_parasite_families.txt:四种吸虫科与一个外类群的部分28S基因序列(存储为FASTA格式,但文件扩展名为.txt)(从GenBank下载),用于估算科间的遗传距离; 6. 28S_parasites_aligned2.phy:基于28S_parasite_families.txt、经mafft_command.txt中命令比对得到的四种吸虫物种的部分28S基因比对序列; 7. mafft_command.txt:在Linux(NeSI系统)中运行MAFFT以比对COI与28S序列(即partialCOI_trematodes_short.fasta与28S_parasite_families.txt)所用的命令。
提供机构:
figshare
创建时间:
2023-05-17
二维码
社区交流群
二维码
科研交流群
商业服务