Multispecies pangenomes reveal a pervasive influence of population size on structural variation

NIAID Data Ecosystem2026-05-10 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.8pk0p2p01

下载链接

链接失效反馈

官方服务：

资源简介：

Structural variants (SVs) are widespread in vertebrate genomes, yet their evolutionary dynamics remain poorly understood. Using 45 long-read de novo genome assemblies and pangenome tools, we analyze SVs within three closely related species of North American jays (Aphelocoma, scrub-jays) displaying a 60-fold range in effective population size. We find rapid evolution of genome architecture, including ~100 Mb variation in genome size driven by dynamic satellite landscapes with unexpectedly long (> 10 kb) repeat units and widespread variation in gene content, influencing gene expression. SVs exhibit slightly deleterious dynamics modulated by variant length and population size, with strong evidence of adaptive fixation only in large populations. Our results demonstrate how population size shapes the distribution of SVs and the importance of pangenomes to characterizing genomic diversity. Methods Forty-four genomes from three species of North American scrub jays (Aphelocoma insularis, A. woodhouseii and A. coerulescens) and one outgroup (Yucatán Jay, Cyanocorax yucatanicus) were sequenced using PacBio HiFi technology. The sequence reads were assembled into primary assemblies and two haplotype assemblies using hifiasm (Cheng et al. 2021). We used various pangenome tools, including the Pangenome Graph Builder (PGGB; Garrison et al. 2024) and minigraph (Li et al. 2020) to detect and characterize structural variants, including inversions, within and between species. We used RepeatModeler2 and RepeatMasker to annotate repetitive elements (Smit et al. 2015 , Flynn et al. 2020). We conducted demographic analysis with PSMC (Li et al. 2011), bpp (Rannala et al. 2017) and other programs. We used Panacus to estimate growth curves for the pangenome graphs (Parmigiani et al. 2024), and fastDFE (Sendrowski et al. 2024) and anavar (Barton et al. 2018) to estimate the distribution of selection co-efficients. We used Pangene to estimate pangene graphs within and between species (Li et al. 2024). Barton HJ, Zeng K. 2018. New Methods for Inferring the Distribution of Fitness Effects for INDELs and SNPs. Mol Biol Evol: 35:1536-1546. Cheng H, Concepcion GT, Feng X, Zhang H, Li H. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods: 18:170-175. Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A: 117:9451-9457. Garrison E, Guarracino A, Heumos S, Villani F, Bao Z, Tattini L, Hagmann J, Vorbrugg S, Marco-Sola S, Kubica C, et al. 2024. Building pangenome graphs. Nat Methods: 21:2008-2012. Li H, Durbin R. 2011. Inference of human population history from individual whole-genome sequences. Nature: 475:493-496. Li H, Feng X, Chu C. 2020. The design and construction of reference pangenome graphs with minigraph. Genome Biol: 21:265. Li H, Marin M, Farhat MR. 2024. Exploring gene content with pangene graphs. Bioinformatics: 40:1367-4811 (Electronic). Parmigiani L, Garrison E, Stoye J, Marschall T, Doerr D. 2024. Panacus: fast and exact pangenome growth and core size estimation. Bioinformatics: 40. Rannala B, Yang Z. 2017. Efficient Bayesian species tree inference under the multispecies coalescent. Systematic Biology: 66:823-842. Sendrowski J, Bataillon T. 2024. fastDFE: Fast and Flexible Inference of the Distribution of Fitness Effects. Molecular Biology and Evolution: 41:msae070. Smit AF, Hubley R, Green P. 2015 RepeatMasker Open-4.0. <http://www.repeatmasker.org>. .

创建时间：

2025-12-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集