Ocymyrmex_contigs
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Ocymyrmex_contigs/24585849
下载链接
链接失效反馈官方服务:
资源简介:
DNA was extracted destructively (either using the whole specimen or two legs, depending on the size of the specimen) or non-destructively (by soaking the whole specimen in proteinase-K and then rinsing in ethanol and remounting) from mounted museum specimens collected between 1980 and 2015, using a DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA, USA). The standard protocol described by the manufacturer was followed except for the final elution step; 130 µL of AE buffer was added to ensure that there was enough DNA for quality checks and UCE library preparation. We ran 10 µL of each DNA extract on a 2% agarose gel to assess any degradation of DNA. The quantity of extracted genomic DNA was quantified using a Qubit 2.0 fluorometer high sensitivity kit (Life Technologies). DNA concentrations ranged between 0,06 to 20,9 ng/µL (Appendix B: Table B1). All UCE laboratory work was conducted in the Laboratories of Analytical Biology (L.A.B.) facilities of the National Museum of Natural History, Smithsonian Institution (Washington, U.S.A.).
DNA was sheared to a target size of approximately 250–600 bp by sonication using a Qsonica Q800 sonicator (Qsonica LLC, Newton, CT, U.S.A.). Sheared DNA fragments were used for genomic library preparations following a modified protocol described in Faircloth et al. (2015), using a Kapa Hyper Prep Library Kit (Kapa Biosystems, Wilmington, MA, U.S.A.) and a generic SPRI substitute (Fisher et al., 2011; “speedbeads” (Faircloth et al., 2015)) for bead-based clean-up steps. We ligated dual-indexing Illumina TruSeq-style adapters (iTru i5 and i7 primers) (Faircloth and Glenn, 2012) to 15 µL DNA template during a PCR reaction consisting of 25 µL HiFi HotStart polymerase (Kapa Biosystems, Wilmington, MA, U.S.A.), 2.5 µL each of iTru i5 and i7 primers (5nM each) and 5 µL ddH20. The following thermal protocol was used: 98°C for 45 s, 13 cycles of 98°C for 15 s, 65°C for 30 s, 72°C for 60 s and final extension at 72°C for 5 min. PCR products were purified using 1.0X speedbeads and eluted with 23 mL of pH 8 elution buffer (EB; 10 mM Tris-Cl, pH 8.5; ddH2O). DNA concentration was measured using a Qubit 2.0 fluorometer. In addition, we ran 2 µL of library products on an agarose gel. We pooled libraries together at equimolar concentrations into enrichment pools, and pool concentrations were adjusted accordingly using a vacuum centrifuge.
Enrichment was performed using the ‘Hymenoptera-v2-ANT-SPECIFIC’ bait set, which includes 9446 unique baits targeting 2590 UCE loci specific to the order Hymenoptera (Branstetter et al., 2017). The library enrichment procedures for the MYcroarray (now ArborBiosciences) MYBaits kit protocol v3 (Blumenstiel et al., 2010) were followed. We added 0.7 µL of 500 µM custom blocking oligos designed for the sequence tags. Enrichment incubation was performed at 65°C for 24 hours. After this step, all pools were bound to streptavidin beads (MyOne C1; Life Technologies) and the enriched pools were purified. We used the “with-bead” approach for PCR amplification of the enriched libraries using the approach described in Faircloth et al. (2015). We combined 15 µL of enriched pool with 25 µL of HiFi HotStart Taq (Kapa Biosystems), 5 µL of Illumina TruSeq primer mix (5 nM each), and 5 µL of ddH20, and ran the reaction at 98°C for 45 s; 18 cycles of 98°C for 15 s; 60°C for 30 s; 72°C for 60s; and a final extension of 72°C for 5 min. We purified the resulting reactions using 1.0X speedbeads and rehydrated the enriched pools in 22 µL EB. We quantified 2 µL of each enriched pool using a Qubit fluorometer.
We quantified the DNA concentration of each library pool by performing qPCR using a SYBR® FAST qPCR kit (Kapa Biosystems) with a ViiA™ 7 Real-Time PCR System (Life Technologies). We used the measured concentrations to pool libraries at equimolar concentrations. This final pool was then size-selected to a fragment range of 250–800 bp using a BluePippin (SageScience, Beverly, MA, U.S.A.). The pooled libraries were sequenced using two partial lanes of 125bp paired-end Illumina HiSeq 2500 sequencing runs at the University of Utah’s Huntsman Cancer Institute.
The demultiplexed FASTQ data were cleaned and trimmed of adapters using Illumiprocessor v.2.0 (Faircloth, 2013), based on the package Trimmomatic (Bolger et al., 2014). Data processing was done through a series of scripts available in the PHYLUCE package v.1.7.1 (Faircloth, 2015). Trimmed reads were assembled into contigs using a wrapper script (phyluce_assembly_assemblo_trinity.py) and the program TRINITY (version trinityrnaseq_r20140717) (Grabherr et al., 2011). We used the PHYLUCE pipeline to identify and extract contigs containing UCE loci. Species-specific contig assemblies were aligned to a FASTA file of all enrichment baits using phyluce_assembly_match_contigs_to_probes.py (min_coverage=50, min_identity=80). A list of UCE loci shared across all taxa was generated by using phyluce_assembly_get_match_counts.py. This list was then used to create FASTA files for each UCE locus using phyluce_get_fastas_from_match_counts.py. All sequence data in these FASTA files were aligned using MAFFT (Katoh and Standley, 2013) through phyluce_seqcap_align.py (min. length =100, no trim) and trimmed using a wrapper script (get_gblocks_trimmed_alignment_from_untrimmed.py) for Gblocks (Castresana, 2000) with the following settings: b1=0.5, b2=0.5, b3=12, b4=7. After trimming, multiple subsets based on filtering UCE loci for different levels of taxon occupancy (70%, 80% and 90% taxon completeness) were created using phyluce_get_only_loci_with_min_taxa.py, and we generated statistics across all subsets using get_align_summary_data.py. Individual alignments of UCE loci for each subset were then concatenated into one nexus alignment file with phyluce_align_format_nexus_files_for_raxml.py script for subsequent phylogenetic analyses. SPRUCEUP v2020.2.19 (Borowiec, 2019) was used to remove poorly aligned sequences or sequence fragments. The matrices were trimmed based on the following cut-off values: 95%, 97%, 98% and 99%. For this study, all the analyses here are based on 97% and 98% cut-off values, as a 95% cut-off was too stringent, and a 99% cut-off did not trim outlier sequences sufficiently.
创建时间:
2024-03-31



