five

Concatenated analyses_partitioned

收藏
DataCite Commons2024-03-31 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/Concatenated_analyses_partitioned/24585897
下载链接
链接失效反馈
官方服务:
资源简介:
The demultiplexed FASTQ data were cleaned and trimmed of adapters using Illumiprocessor v.2.0 (Faircloth, 2013), based on the package Trimmomatic (Bolger et al., 2014). Data processing was done through a series of scripts available in the PHYLUCE package v.1.7.1 (Faircloth, 2015). Trimmed reads were assembled into contigs using a wrapper script (phyluce_assembly_assemblo_trinity.py) and the program TRINITY (version trinityrnaseq_r20140717) (Grabherr et al., 2011). We used the PHYLUCE pipeline to identify and extract contigs containing UCE loci. Species-specific contig assemblies were aligned to a FASTA file of all enrichment baits using phyluce_assembly_match_contigs_to_probes.py (min_coverage=50, min_identity=80). A list of UCE loci shared across all taxa was generated by using phyluce_assembly_get_match_counts.py. This list was then used to create FASTA files for each UCE locus using phyluce_get_fastas_from_match_counts.py. All sequence data in these FASTA files were aligned using MAFFT (Katoh and Standley, 2013) through phyluce_seqcap_align.py (min. length =100, no trim) and trimmed using a wrapper script (get_gblocks_trimmed_alignment_from_untrimmed.py) for Gblocks (Castresana, 2000) with the following settings: b1=0.5, b2=0.5, b3=12, b4=7. After trimming, multiple subsets based on filtering UCE loci for different levels of taxon occupancy (70%, 80% and 90% taxon completeness) were created using phyluce_get_only_loci_with_min_taxa.py, and we generated statistics across all subsets using get_align_summary_data.py. Individual alignments of UCE loci for each subset were then concatenated into one nexus alignment file with phyluce_align_format_nexus_files_for_raxml.py script for subsequent phylogenetic analyses. SPRUCEUP v2020.2.19 (Borowiec, 2019) was used to remove poorly aligned sequences or sequence fragments. The matrices were trimmed based on the following cut-off values: 95%, 97%, 98% and 99%. For this study, all the analyses here are based on 97% and 98% cut-off values, as a 95% cut-off was too stringent, and a 99% cut-off did not trim outlier sequences sufficiently.
提供机构:
figshare
创建时间:
2023-11-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作