Concatenated analyses_unpartitioned
收藏DataCite Commons2024-03-31 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/Concatenated_analyses_unpartitioned/24585954
下载链接
链接失效反馈官方服务:
资源简介:
The demultiplexed FASTQ data were cleaned and trimmed of adapters using Illumiprocessor v.2.0 (Faircloth, 2013), based on the package Trimmomatic (Bolger et al., 2014). Data processing was done through a series of scripts available in the PHYLUCE package v.1.7.1 (Faircloth, 2015). Trimmed reads were assembled into contigs using a wrapper script (phyluce_assembly_assemblo_trinity.py) and the program TRINITY (version trinityrnaseq_r20140717) (Grabherr et al., 2011). We used the PHYLUCE pipeline to identify and extract contigs containing UCE loci. Species-specific contig assemblies were aligned to a FASTA file of all enrichment baits using phyluce_assembly_match_contigs_to_probes.py (min_coverage=50, min_identity=80). A list of UCE loci shared across all taxa was generated by using phyluce_assembly_get_match_counts.py. This list was then used to create FASTA files for each UCE locus using phyluce_get_fastas_from_match_counts.py. All sequence data in these FASTA files were aligned using MAFFT (Katoh and Standley, 2013) through phyluce_seqcap_align.py (min. length =100, no trim) and trimmed using a wrapper script (get_gblocks_trimmed_alignment_from_untrimmed.py) for Gblocks (Castresana, 2000) with the following settings: b1=0.5, b2=0.5, b3=12, b4=7. After trimming, multiple subsets based on filtering UCE loci for different levels of taxon occupancy (70%, 80% and 90% taxon completeness) were created using phyluce_get_only_loci_with_min_taxa.py, and we generated statistics across all subsets using get_align_summary_data.py. Individual alignments of UCE loci for each subset were then concatenated into one nexus alignment file with phyluce_align_format_nexus_files_for_raxml.py script for subsequent phylogenetic analyses. SPRUCEUP v2020.2.19 (Borowiec, 2019) was used to remove poorly aligned sequences or sequence fragments. The matrices were trimmed based on the following cut-off values: 95%, 97%, 98% and 99%. For this study, all the analyses here are based on 97% and 98% cut-off values, as a 95% cut-off was too stringent, and a 99% cut-off did not trim outlier sequences sufficiently.
本研究采用Illumiprocessor v.2.0(Faircloth, 2013),依托Trimmomatic工具包(Bolger等, 2014),对已解复用的FASTQ格式测序数据进行清洁与接头修剪处理。后续的数据处理流程依托PHYLUCE工具包v.1.7.1(Faircloth, 2015)中的一系列脚本完成。我们使用封装脚本phyluce_assembly_assemblo_trinity.py与TRINITY程序(版本号trinityrnaseq_r20140717)(Grabherr等, 2011),将修剪后的测序读段组装为重叠群。通过PHYLUCE流程识别并提取包含超保守元件基因座(UCE loci)的重叠群。利用phyluce_assembly_match_contigs_to_probes.py脚本,将物种特异性的重叠群组装结果与包含所有富集捕获探针的FASTA格式文件进行序列比对,设置参数为min_coverage=50、min_identity=80。通过phyluce_assembly_get_match_counts.py脚本生成覆盖所有类群的共享UCE基因座列表。随后基于该列表,使用phyluce_get_fastas_from_match_counts.py脚本为每个UCE基因座生成对应的FASTA格式文件。借助phyluce_seqcap_align.py脚本调用MAFFT(Katoh和Standley, 2013)对上述FASTA文件中的所有序列数据进行比对,设置参数为min. length=100、no trim;再通过适配Gblocks程序的封装脚本get_gblocks_trimmed_alignment_from_untrimmed.py完成序列修剪,所用参数如下:b1=0.5、b2=0.5、b3=12、b4=7。序列修剪完成后,使用phyluce_get_only_loci_with_min_taxa.py脚本,基于不同的类群占据率阈值(70%、80%、90%的类群完整性)创建多个数据集子集。通过get_align_summary_data.py脚本统计所有子集的序列统计信息。随后利用phyluce_align_format_nexus_files_for_raxml.py脚本,将每个子集对应的UCE基因座个体比对结果拼接为一个NEXUS格式比对文件,用于后续的系统发育分析。使用SPRUCEUP v2020.2.19(Borowiec, 2019)移除比对质量不佳的序列或序列片段。基于95%、97%、98%和99%四种截断值对序列矩阵进行修剪。本研究最终选用97%与98%的截断值作为分析基础,因95%截断阈值过于严格,而99%截断阈值无法有效剔除异常序列。
提供机构:
figshare
创建时间:
2023-11-27



