five

ClinePlotR: Visualizing genomic clines and detecting outliers in R

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.b2rbnzsc8
下载链接
链接失效反馈
官方服务:
资源简介:
Patterns of multi-locus differentiation (i.e., genomic clines) often extend broadly across hybrid zones and their quantification can help diagnose how species boundaries are shaped by adaptive processes, both intrinsic and extrinsic. In this sense, the transitioning of loci across admixed individuals can be contrasted as a function of the genome-wide trend, in turn allowing an expansion of clinal theory across a much wider array of biodiversity. However, computational tools that serve to interpret and consequently visualize ‘genomic clines’ are limited. Here, we introduce the ClinePlotR R-package for visualizing genomic clines and detecting outlier loci using output generated by two popular software packages, bgc and Introgress. ClinePlotR bundles both input generation (i.e, filtering datasets and creating specialized file formats) and output processing (e.g., MCMC thinning and burn-in) with functions that directly facilitate interpretation and hypothesis testing. Tools are also provided for post-hoc analyses that interface with external packages such as ENMeval and RIdeogram Our package increases the reproducibility and accessibility of genomic cline methods, thus allowing an expanded user base and promoting these methods as mechanisms to address diverse evolutionary questions in both model and non-model organisms. Methods Transcriptomic alignment Terrapene carolina carolina and Terrapene mexicana triunguis ddRADseq reads were mapped to the Terrapene mexicana triunguis reference transcriptome (GenBank Accession ID: GCA_002925995.2). Only mapped reads were retained to obtain a "transcriptome alignment." Scaffold alignment Terrapene carolina carolina and Terrapene mexicana triunguis ddRADseq reads were mapped to the Terrapene mexicana triunguis reference genome (GenBank Accession ID: GCA_002925995.2), which in this case consisted of unplaced scaffolds. Thus, this alignment is dubbed the "scaffold alignment." Each alignment was done separately in ipyrad v0.7.30 and mapping was done with minimap2. The alignments were also required to have >50% minimum coverage per site and a minimum 20X coverage depth. Running BGC Using vcftools, VCF files for each of the above alignments were filtered to retain only bi-allelic SNPs, thinned to only one SNP per ddRAD locus, and subjected to a minimum minor allele frequency (MAF) filter of 5.0%. vcf2bgc.py was then run to convert the ipyrad-style VCF files to BGC format, with the genotype uncertainty model incorporating read counts. BGC (Bayesian Genomic Clines) was run on both alignments separately, with five independent runs each. 1.8 million MCMC iterations for the transcriptomic alignment and 1.0 million for the scaffold alignment were discarded as burn-in, followed by 200,000 post burn-in generations each. Samples were thinned to one every 50 iterations. The genotype uncertainty model was used with a sequencing error rate of 0.001-0.002 (determined using ipyrad). ClinePlotR bgcPlotter For each alignment, the five independent runs were aggregated to yield 20,000 total MCMC samples, with LnL and parameter traces confirming appropriate mixing and convergence. ClinePlotR was then used to identify significant outlier loci using two established criteria (Gompert and Buerkle, 2011, 2012). A Phi plot and an alpha X beta contour plot were made for each alignment. For the chromosome plot, the transcriptome and scaffold alignments were mapped to the closely related, chromosome-level Trachemys scripta reference genome using minimap2 and PAFScaff (since the Terrapene reference genome is scaffold-level only).  This allowed us to convert scaffold coordinates to chromosome coordinates. The chromosome plot (ideogram) was made by joining GFF annotation information with the transcriptIds from the ddRAD dataset. Both these and the scaffoldIds that now contained chromosome coordinates were input into our plot_outlier_ideogram() function, which uses RIdeogram to make the chromosome plot. The plot is a karyotype-style chromosome plot with alpha and beta values represented as heatmap bands to visualize the distribution of outliers across the genome. Introgress functions The 19 BioClim raster layers plus mean annual solar radiation and mean annual wind speed were obtained from https://worldclim.org. A layer from the National Land-cover Database (2011) was also obtained. These layers were input into our prepare_rasters() function to crop them to the same sampling extent (plus a small buffer). ENMeval was then run to find the most informaive raster features. The raster values at each sampling locality were extracted using our extractPointValues() function. Introgress was then run using our wrapper function, runIntrogress(). A minimum allele frequency differential (delta) value of 0.8 was required for a locus to be retained. The estimated genomic clines and hybrid indices were then input into the clinesXenvironment() function to plot genomic clines and hybrid indices X latitude, longitude, and the environmental variables (raster values).
创建时间:
2020-08-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作