Data for: PickMe: sample selection for species tree reconstruction using coalescent weighted quartets
收藏Mendeley Data2024-04-13 更新2024-06-27 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.3r2280ggv
下载链接
链接失效反馈官方服务:
资源简介:
We obtained targeted sequence data for 763 putatively single-copy nuclear loci for samples of 62 North American and two African outgroup species, Asclepias physocarpa and A. fornicata, using the target enrichment baits of Weitemier et al. (2014). Data for 32 of these samples and orthologs from the genome sequence of Asclepias syriaca W(eitemier et al., 2019) were included in the analyses of Boutte et. al. (2019), and nuclear sequence data for the additional 30 samples were generated using the DNA sequencing and assembly methods described therein. Boutte et. al. (2019( had excluded the 30 newly analyzed samples based on an ad hoc minimum gene recovery criterion of 600 genes (79\%) with the goal of high gene occupancy for species tree analyses. For the analyses conducted here, we masked assembled sequences with Ns for very low read depth ($\le 2$ reads) and at heterozygous sites (i.e., intra-individual SNPs). For each gene, we aligned masked sequences using Mafft v. 7.245 with default parameters s (Katoh and Standley, 2013), and then removed sequences with less than 50\% of the total alignment length following Sayyari 285 et al. (2017). We selected a subset of 703 genes, which had been identified by Boutte et. al. (2019) as producing the best resolved milkweed phylogenies based on bootstrap support across the gene trees, for further analysis. For the complete data set of 62 species, we first estimated the 703 gene trees using Neighbor Joining on uncorrected distances (the proportion of observed differences in the aligned sequences) as implemented in the ape package e (Paradis and Schliep, 2018) in R v. 3.5.1 (R Core Team, 2013). Using these estimated gene trees, we then identified the samples to be included in species tree analyses using \emph{PickMe}. To determine whether the gene tree inference method affected the sample selection results, we also used the GTR+Gamma model in RAxML v. 8.2.12; (Stamatakis, 2014) to estimate the initial gene trees. For the set of samples identified as reliable by PickMe, we realigned the sequences and then removed small alignments ($< 100$ bp) following Boutte et. al. (2019). We then used IQ-Tree v. 1.5.4 (Nguyen et al., 2014; Chernomor et al., 2016) t to select the best model of molecular evolution for the retained alignments and inferred the gene tree for each locus using the same parameters as \cite{BOUTTE2019106534}. Using ASTRAL-II v. 4.10.12 (Mirarab and Warnow, 2015)with default parameters, we inferred a species tree and calculated local posterior probability support (Sayyari and Mirarab, 302 2016).. We calculated gene concordance factors using the method of Minh et al. (2020), implemented in IQ-Tree v. 2.1.2 (Nguyen et al., 2014; Chernomor et al., 2016).
创建时间:
2023-06-28



