Perianth evolution and implications for generic delimitation in the Eucalypts (Myrtaceae): DNA sequences, morphological data

Mendeley Data2024-04-13 更新2024-06-29 收录

下载链接：

https://datadryad.org/stash/dataset/doi:10.5061/dryad.7sqv9s4wq

下载链接

链接失效反馈

官方服务：

资源简介：

This is an aligned dataset of 101 sequences of low copy nuclear loci by 392 species-level eucalypt taxa representing the phylogenetic diversity of Myrtaceae tribe Eucalypteae. Sampling Hereafter, the term "Eucalypts s.l." refers collectively to Angophora, Corymbia and Eucalyptus. Sampling represented all genera within Myrtaceae tribe Eucalypteae sensu Wilson et al. (2005) except Allosyncarpia, Eucalyptopsis and Stockwellia. Within the eucalypts, both subgenera of Corymbia (Parra-O et al. 2009) and five of seven subgenera of Eucalyptus—except Acerosae (E. curtisii) and Alveolata (E. microcorys), both of which are monotypic—were sampled. There were 392 species-level taxa, including eleven outgroups, which were selected based on whole-of-family phylogenies by Wilson et al. (2005) and Thornhill et al. (2012; 2015) and included species of Osbornia and Melaleuca (Melaleuceae), Backhousia (Backhousieae), Tristaniopsis (Kanieae), Syncarpia (Syncarpieae) and Arillastrum (Eucalypteae). The tree was rooted between Melaleuceae and the rest, based on earlier studies (Wilson et al. 2005; Thornhill et al. 2015). The majority of samples were field-collected leaf tissue with vouchers lodged in the Australian National Herbarium (CANB), where the identifications were verified by co-author Slee. These were supplemented by leaf samples taken directly from CANB herbarium sheets, with permission. Samples from Currency Creek Arboretum were taken with permission from vouchered living trees (details in Thornhill et al. 2015). All taxa and accessions sampled are listed in Supplementary Table S1, and nomenclature follows Brooker (2000), as updated by Slee et al. (2020). Target Capture and Sequencing We used a target-capture approach aimed at identifying and sequencing up to 200 orthologous low-copy loci from the nuclear genome with potential to resolve species-level relationships across the large family Myrtaceae, as per Choi et al. (2019), Data from: Identifying genetic markers for a range of phylogenetic utility–from species to family level, Dryad, Dataset, https://doi.org/10.5061/dryad.p20km22 In plates of 48 samples, the pooled DNA library for each specimen was hybridised to the target probes using the SeqCap EZ Developer Library (NimbleGen, Madison, USA) following the manufacturer’s instructions with minor modifications detailed in Choi et al. (2019). Recovery and wash of hybridised samples was carried out using the SeqCap Hybridisation and Wash Kit (NimbleGen, Mannheim, Germany) following the manufacturer’s instructions. After indexing-PCR and purification, the captured libraries were sequenced on the Illumina Miseq platform (one pool of 48 samples) and the HiSeq2000 (all other pools) platform (100 bp paired-end read protocol) at the Bio-molecular Research Facilities at The Australian National University. Data Handling and Mapping of Reads The quality of the raw reads was investigated using FastQC (Andrews 2010) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). BBduk within BBTools was used to remove Illumina adapters, low quality reads and sequences using standard parameters (trimq=30, minlen=40, ktrim=r, hdist=1, tpe tbo; http://jgi.gov/data-and-tools/bb-tools/). The cleaned reads were rechecked using FastQC. After read cleaning, axe demultiplexer was used to sort reads by barcode using standard parameters (https://manpages.debian.org/testing/axe-demultiplexer/axe-demux.1.en.html). The reads were mapped against the E. grandis targets using bwa-mem (Li et al. 2009a). SAM files were converted to BAM, sorted and indexed using samtools v1.3.1 (Li et al. 2009b). Picard was used to remove duplicates (http://broadinstitute.github.io/picard/). Finally, Platypus was used to call variants with standard parameters (Rimmer et al. 2014). Sequence Alignment and Editing Sequences were imported into Geneious Prime ver. 2020.1.2 (Biomatters Ltd) for assembly, alignment and editing. Initially, each locus was aligned separately across all samples using MAFFT ver. 1.4.0 (Biomatters Ltd). After trimming, alignments were adjusted by eye. This included deleting sites with > 95% missing data. A Neighbour Joining tree was generated for each locus and inspected for anomalies, such as likely chimeric sequences indicated by long, often misplaced branches. Every locus was assessed for paralogy (multiple gene copies) as indicated by systematic sharing of polymorphisms among distantly related taxa, and such loci were excluded. Randomly scattered (unshared) polymorphic base calls were assumed to indicate allelic variation and such loci were retained. Ninety-nine of the 200 targeted genes were discarded, leaving 101 putatively single copy genes. These were concatenated using Geneious. All samples with > 60% of concatenated sequence missing were culled, leaving 392 of 521 of the original eucalypt + outgroup sequences in the final alignment, which totalled 129,354 base pairs, comprising 27,100 parsimony-informative sites, 14,807 singleton sites and 87,447 constant sites. The final set of 101 loci are listed in Supplementary Table S2, identified by their labels in the annotated Eucalyptus grandis genome (Myburg et al. 2014). Phylogenetic Analysis Phylogenies were first estimated from the concatenated sequences of all 101 nuclear loci, initially treated as a single partition, using maximum likelihood (ML) as implemented in RAxML ver. 8.2.12 (Stamatakis 2014) on the CIPRES Science Gateway (Miller et al. 2010) with a GTR+G model. Additionally, ML analyses were run using IQtree ver. 1.6.10 (Nguyen et al. 2015), first with a single partition and then with the DNA divided into 101 partitions, each with its own model (Chernomor et al. 2016), estimated using ModelFinder (Kalyaanamoorthy et al. 2017). Node support was estimated using Ultrafast bootstrap (UFB) with 1000 replicates (Minh et al. 2013; Hoang et al. 2018), as well as site (sCF) and gene (gCF) concordance factors (Minh et al. 2020). Mapping of Perianth Traits The IQtrees with branch lengths were imported to Mesquite ver. 3.61 (Maddison et al. 2019) for trait mapping and hypothesis testing. Relevant trait data from Euclid edition 4 (Slee et al. 2020) were also imported to Mesquite and we defined morphological characters for testing hypotheses about perianth evolution.

创建时间：

2023-06-28