Concatenated analyses_unpartitioned

Name: Concatenated analyses_unpartitioned
Creator: figshare
Published: 2024-03-31 00:27:26
License: 暂无描述

DataCite Commons2024-03-31 更新2024-08-18 收录

下载链接：

https://figshare.com/articles/dataset/Concatenated_analyses_unpartitioned/24585954

下载链接

链接失效反馈

官方服务：

资源简介：

The demultiplexed FASTQ data were cleaned and trimmed of adapters using Illumiprocessor v.2.0 (Faircloth, 2013), based on the package Trimmomatic (Bolger et al., 2014). Data processing was done through a series of scripts available in the PHYLUCE package v.1.7.1 (Faircloth, 2015). Trimmed reads were assembled into contigs using a wrapper script (phyluce_assembly_assemblo_trinity.py) and the program TRINITY (version trinityrnaseq_r20140717) (Grabherr et al., 2011). We used the PHYLUCE pipeline to identify and extract contigs containing UCE loci. Species-specific contig assemblies were aligned to a FASTA file of all enrichment baits using phyluce_assembly_match_contigs_to_probes.py (min_coverage=50, min_identity=80). A list of UCE loci shared across all taxa was generated by using phyluce_assembly_get_match_counts.py. This list was then used to create FASTA files for each UCE locus using phyluce_get_fastas_from_match_counts.py. All sequence data in these FASTA files were aligned using MAFFT (Katoh and Standley, 2013) through phyluce_seqcap_align.py (min. length =100, no trim) and trimmed using a wrapper script (get_gblocks_trimmed_alignment_from_untrimmed.py) for Gblocks (Castresana, 2000) with the following settings: b1=0.5, b2=0.5, b3=12, b4=7. After trimming, multiple subsets based on filtering UCE loci for different levels of taxon occupancy (70%, 80% and 90% taxon completeness) were created using phyluce_get_only_loci_with_min_taxa.py, and we generated statistics across all subsets using get_align_summary_data.py. Individual alignments of UCE loci for each subset were then concatenated into one nexus alignment file with phyluce_align_format_nexus_files_for_raxml.py script for subsequent phylogenetic analyses. SPRUCEUP v2020.2.19 (Borowiec, 2019) was used to remove poorly aligned sequences or sequence fragments. The matrices were trimmed based on the following cut-off values: 95%, 97%, 98% and 99%. For this study, all the analyses here are based on 97% and 98% cut-off values, as a 95% cut-off was too stringent, and a 99% cut-off did not trim outlier sequences sufficiently.

本研究采用Illumiprocessor v.2.0（Faircloth, 2013），依托Trimmomatic工具包（Bolger等, 2014），对已解复用的FASTQ格式测序数据进行清洁与接头修剪处理。后续的数据处理流程依托PHYLUCE工具包v.1.7.1（Faircloth, 2015）中的一系列脚本完成。我们使用封装脚本phyluce_assembly_assemblo_trinity.py与TRINITY程序（版本号trinityrnaseq_r20140717）（Grabherr等, 2011），将修剪后的测序读段组装为重叠群。通过PHYLUCE流程识别并提取包含超保守元件基因座（UCE loci）的重叠群。利用phyluce_assembly_match_contigs_to_probes.py脚本，将物种特异性的重叠群组装结果与包含所有富集捕获探针的FASTA格式文件进行序列比对，设置参数为min_coverage=50、min_identity=80。通过phyluce_assembly_get_match_counts.py脚本生成覆盖所有类群的共享UCE基因座列表。随后基于该列表，使用phyluce_get_fastas_from_match_counts.py脚本为每个UCE基因座生成对应的FASTA格式文件。借助phyluce_seqcap_align.py脚本调用MAFFT（Katoh和Standley, 2013）对上述FASTA文件中的所有序列数据进行比对，设置参数为min. length=100、no trim；再通过适配Gblocks程序的封装脚本get_gblocks_trimmed_alignment_from_untrimmed.py完成序列修剪，所用参数如下：b1=0.5、b2=0.5、b3=12、b4=7。序列修剪完成后，使用phyluce_get_only_loci_with_min_taxa.py脚本，基于不同的类群占据率阈值（70%、80%、90%的类群完整性）创建多个数据集子集。通过get_align_summary_data.py脚本统计所有子集的序列统计信息。随后利用phyluce_align_format_nexus_files_for_raxml.py脚本，将每个子集对应的UCE基因座个体比对结果拼接为一个NEXUS格式比对文件，用于后续的系统发育分析。使用SPRUCEUP v2020.2.19（Borowiec, 2019）移除比对质量不佳的序列或序列片段。基于95%、97%、98%和99%四种截断值对序列矩阵进行修剪。本研究最终选用97%与98%的截断值作为分析基础，因95%截断阈值过于严格，而99%截断阈值无法有效剔除异常序列。

提供机构：

figshare

创建时间：

2023-11-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集