five

These are all the scripts, FastQC reports, filtered datasets and metadata used for analysis. Please read the readme.docx or README.txt file for a detailed explanation of what each file contains.

收藏
DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/These_are_all_the_scripts_FastQC_reports_filtered_datasets_and_metadata_used_for_analysis_Please_read_the_readme_file_for_a_detailed_explanation_of_what_each_file_contains_/22881482/4
下载链接
链接失效反馈
官方服务:
资源简介:
<b>readme.docx </b>and<b> README.txt </b>files contain the same information. They provide all the details necessary to understand what each file in the repository is, including details of all columns of the metadata file.<br><b>fastqc_files.zip</b>- zip folder containing all the html files with the FastQC report for each sample sequenced. The library names match the ones in the Genebank Bioproject PRJNA972185 (Biosamples SAMN35067136 to SAMN35067307)<br><b>*.qza files</b> (output from Qiime2)- clean-filtered-2-500-observed-taxonomy-silva.qza is a Qiime2 output file (qza) conatining the taxonomy of all the features observed in the filtered dataset;- clean-filtered-table-2-500-aligned-rep-seqs-silva.qza is a Qiime2 output file (qza) containing the alignment (not masked) representative sequences of all the features observed in the filtered dataset- clean-filtered-table-2-500-masked-aligned-rep-seqs-silva.qza is a Qiime2 output file (qza) contaning the masked alignemnet of the representative sequences of all the features observed in the filtered dataset- clean-filtered-table-2-500-rep-seqs-silva.qza is a Qiime2 output file (qza) containing the representative sequences of all the features observed in the filtered dataset- clean-filtered-table-2-500-rooted-tree-silva.qza is a Qiime2 output file (qza) containing the rooted phylogenetic tree based on the taxonomy assigned to all the features observed in the filtered dataset- clean-filtered-table-2-500-silva.qza is a Qiime2 output file (qza) containing the filtered feature table- clean-filtered-table-2-500-unrooted-tree-silva.qza is a Qiime2 output file (qza) containing the unrooted phylogenetic tree based on the taxonomy assigned to all the features observed in the filtered dataset.<br><b>Scripts</b>Bioinformatic analyses were undertaken using the New Zealand eScience Infrastructure (NeSI, Linux system) and a local computer (all R analyses, Windows system).The scripts provided are the following:- FastQCandQiime2.md, FastQCandQiime2.pdf and FastQCandQiime2.html:Three file formats (md for markdown, pdf and html) of the script (annotated) used in a Linux System (NeSI) to undertake FastQC analyses and run Qiime2 up to the generation of filtered qza files that were used as input to all R analyses.- Alpha_diversity_rarefied.RAnnotated R script used to load the relevant qza files (filtered output from Qiime2), subset and rarefy the data and estimate alpha diversity comparing environment, parasites and snails, as well as just the different parasite species and other combinations of samples.- barPlots_heatmaps_vennD.RAnnotated R script used to load the relevant qza files (filtered output from Qiime2), subset the data and draw the bar plots of relative abundance, heatmaps and Venn Diagrams (non-rarefied data).- beta_diversity_rarefied.RAnnotated R script used to load the relevant qza files (filtered output from Qiime2), subset and rarefy the data and estimate beta diversity comparing environment, parasites and snails, as well as just the different parasite species and other combinations of samples.- differential_abundance_tests.RAnnotated R script used to load the relevant qza files (filtered output from Qiime2), subset the data (all samples, or only parasites, or parasite-snail host pairs, or only snails) to run the tests of differential abundance (Aldex 2, corncob and metastat methods).- indicspecies.RAnnotated R script used to load the relevant qza files (filtered output from Qiime2), subset the data and run the indicator taxa tests (Indicspecies R package).- phylogeneticVsMicrobiome_distances.RAnnotated R script used to calculate the genetic distances between the four trematode species, calculate the microbiome distances between these four trematode species (based on beta diversity metrics of rarefied data) and test for association between genetic distance and microbiome distance (phylosymbiosis). It includes tests of normality (Shapiro Wilk) and of correlation, as well as mantel tests.<br><b>Other files</b>-run_insight.csvOutput statistics of sequencing run-OG7633_libQC.pdfBioanalyzer result-metadata_complete.csvAll the metadata used in the different scripts. Columns are described in detail in the readme file.- partialCOI_trematodes_short.fastaPartial COI gene sequence of the four trematode species (downloaded from Genebank), used estimate genetic distance between species.- partialCOI_trematodes_short_aligned.phyAligned sequences (partial COI gene) of the four trematode species (based on partialCOI_trematodes_short.fasta and aligned with mafft_command.txt)- 28S_parasite_families.txtPartial 28S gene sequence (in fasta file, but using extension .txt) for the four trematode families and an outgroup (downloaded from Genebank), used to estimate genetic distance between families- 28S_parasites_aligned2.phyAligned sequences (partial 28S gene) of the four trematode species (based on 28S_parasite_families.txt and aligned with mafft_command.txt)- mafft_command.txtCommand used in Lynux (NeSI system) to run MAFFT and align the COI and 28S sequences (partialCOI_trematodes_short.fasta and 28S_parasite_families.txt)

**readme.docx** 与 **README.txt**:二者包含完全相同的信息,可用于理解本仓库中所有文件的详细说明,包括元数据文件的全部列的细节。 **fastqc_files.zip**:包含所有测序样本的FastQC(FastQC)报告HTML文件的压缩包,其文库名称与Genebank(GenBank)生物项目PRJNA972185(生物样本SAMN35067136 至 SAMN35067307)中的名称完全匹配。 ***.qza 文件**(Qiime2(QIIME 2)输出文件): - clean-filtered-2-500-observed-taxonomy-silva.qza:Qiime2(QIIME 2)输出文件(qza格式),包含过滤后数据集中所有观测到的特征的分类学信息; - clean-filtered-table-2-500-aligned-rep-seqs-silva.qza:Qiime2(QIIME 2)输出文件(qza格式),包含过滤后数据集中所有观测到的特征的未屏蔽比对代表序列; - clean-filtered-table-2-500-masked-aligned-rep-seqs-silva.qza:Qiime2(QIIME 2)输出文件(qza格式),包含过滤后数据集中所有观测到的特征的代表序列屏蔽比对结果; - clean-filtered-table-2-500-rep-seqs-silva.qza:Qiime2(QIIME 2)输出文件(qza格式),包含过滤后数据集中所有观测到的特征的代表序列; - clean-filtered-table-2-500-rooted-tree-silva.qza:Qiime2(QIIME 2)输出文件(qza格式),包含基于所有观测特征分配的分类学信息构建的有根系统发育树; - clean-filtered-table-2-500-silva.qza:Qiime2(QIIME 2)输出文件(qza格式),包含过滤后的特征表; - clean-filtered-table-2-500-unrooted-tree-silva.qza:Qiime2(QIIME 2)输出文件(qza格式),包含基于所有观测特征分配的分类学信息构建的无根系统发育树。 **Scripts**: 本研究的生物信息学分析依托新西兰科学基础设施(NeSI,Linux系统)与本地计算机完成(所有R语言(R)分析均运行于Windows系统)。本次提供的脚本如下: 1. FastQCandQiime2.md、FastQCandQiime2.pdf 与 FastQCandQiime2.html:三种格式(md为Markdown格式、pdf与html)的注释版脚本,用于在Linux系统(NeSI)上执行FastQC(FastQC)分析并运行Qiime2(QIIME 2)流程,直至生成可作为所有R语言(R)分析输入的过滤后qza文件。 2. Alpha_diversity_rarefied.R:注释版R语言(R)脚本,用于加载相关qza文件(Qiime2(QIIME 2)的过滤输出结果),对数据进行子集划分与稀疏化处理,估算α多样性,并比较环境、寄生虫与蜗牛样本的差异,以及仅针对不同寄生虫物种及其他样本组合的分析。 3. barPlots_heatmaps_vennD.R:注释版R语言(R)脚本,用于加载相关qza文件(Qiime2(QIIME 2)的过滤输出结果),对数据进行子集划分,绘制相对丰度柱状图、热图与维恩图(Venn Diagram)(使用非稀疏化数据)。 4. beta_diversity_rarefied.R:注释版R语言(R)脚本,用于加载相关qza文件(Qiime2(QIIME 2)的过滤输出结果),对数据进行子集划分与稀疏化处理,估算β多样性,并比较环境、寄生虫与蜗牛样本的差异,以及仅针对不同寄生虫物种及其他样本组合的分析。 5. differential_abundance_tests.R:注释版R语言(R)脚本,用于加载相关qza文件(Qiime2(QIIME 2)的过滤输出结果),对数据进行子集划分(可选择全部样本、仅寄生虫样本、寄生虫-蜗牛宿主配对样本或仅蜗牛样本),运行差异丰度检验(采用Aldex 2、corncob与metastat方法)。 6. indicspecies.R:注释版R语言(R)脚本,用于加载相关qza文件(Qiime2(QIIME 2)的过滤输出结果),对数据进行子集划分,运行指示分类群检验(使用Indicspecies R包)。 7. phylogeneticVsMicrobiome_distances.R:注释版R语言(R)脚本,用于计算四种吸虫物种间的遗传距离,基于稀疏化数据的β多样性指标估算这四种吸虫物种的微生物组距离,并检验遗传距离与微生物组距离之间的关联(系统发育共生分析)。该脚本包含正态性检验(夏皮罗-威尔克检验(Shapiro Wilk test))、相关性检验以及曼特尔检验(Mantel test)。 **Other files**: - run_insight.csv:测序运行的输出统计数据; - OG7633_libQC.pdf:生物分析仪检测结果; - metadata_complete.csv:所有脚本中使用的全部元数据,各列的详细说明见readme文件; - partialCOI_trematodes_short.fasta:四种吸虫的部分COI基因序列(从Genebank(GenBank)下载),用于估算物种间的遗传距离; - partialCOI_trematodes_short_aligned.phy:四种吸虫的部分COI基因序列比对文件(基于partialCOI_trematodes_short.fasta,通过mafft_command.txt完成比对); - 28S_parasite_families.txt:四种吸虫科与一个外类群的部分28S基因序列(fasta格式,但扩展名为.txt),从Genebank(GenBank)下载,用于估算科间的遗传距离; - 28S_parasites_aligned2.phy:四种吸虫的部分28S基因序列比对文件(基于28S_parasite_families.txt,通过mafft_command.txt完成比对); - mafft_command.txt:在Linux(NeSI系统)上运行MAFFT(MAFFT)比对COI与28S序列(partialCOI_trematodes_short.fasta与28S_parasite_families.txt)的命令。
提供机构:
figshare
创建时间:
2023-08-03
二维码
社区交流群
二维码
科研交流群
商业服务