Multispecies pangenomes reveal a pervasive influence of population size on structural variation
收藏DataONE2025-11-19 更新2025-11-29 收录
下载链接:
https://search.dataone.org/view/sha256:d43c786b1421f6bfc9f75c7c32ffc02ab2133e644e6b3f7c3001ba49deb2b102
下载链接
链接失效反馈官方服务:
资源简介:
Structural variants (SVs) are widespread in vertebrate genomes, yet their evolutionary dynamics remain poorly understood. Using 45 long-read de novo genome assemblies and pangenome tools, we analyze SVs within three closely related species of North American jays (Aphelocoma, scrub-jays) displaying a 60-fold range in effective population size. We find rapid evolution of genome architecture, including ~100 Mb variation in genome size driven by dynamic satellite landscapes with unexpectedly long (> 10 kb) repeat units and widespread variation in gene content, influencing gene expression. SVs exhibit slightly deleterious dynamics modulated by variant length and population size, with strong evidence of adaptive fixation only in large populations. Our results demonstrate how population size shapes the distribution of SVs and the importance of pangenomes to characterizing genomic diversity., Forty-four genomes from three species of North American scrub jays (Aphelocoma insularis, A. woodhouseii and A. coerulescens) and one outgroup (Yucatán Jay, Cyanocorax yucatanicus) were sequenced using PacBio HiFi technology. The sequence reads were assembled into primary assemblies and two haplotype assemblies using hifiasm (Cheng et al. 2021). We used various pangenome tools, including the Pangenome Graph Builder (PGGB; Garrison et al. 2024) and minigraph (Li et al. 2020) to detect and characterize structural variants, including inversions, within and between species. We used RepeatModeler2 and RepeatMasker to annotate repetitive elements (Smit et al. 2015 , Flynn et al. 2020). We conducted demographic analysis with PSMC (Li et al. 2011), bpp (Rannala et al. 2017) and other programs. We used Panacus to estimate growth curves for the pangenome graphs (Parmigiani et al. 2024), and fastDFE (Sendrowski et al. 2024) and anavar (Barton et al. 2018) to estimate the distribution o..., , # Data from: Multispecies pangenomes reveal pervasive influence of population size on evolution of structural variants
[https://doi.org/10.5061/dryad.8pk0p2p01](https://doi.org/10.5061/dryad.8pk0p2p01)
## Description of the data and file structure
### Files and variables
\****File: RepeatMasker_analysis.tar.gz:** **
**Description:** This file contains two files related to the analysis of RepeatMasker outputs:
* **all_haps_repmask_nornd_cat_CS_CY.bed.gz**
**Description:** This file contains a streamlined version of the output of RepeatMasker for each haplotype in the data set, including outgroups. The file is in bed format. The [RepeatMasker outfile](https://www.repeatmasker.org/webrepeatmaskerhelp.html) was converted to bed format by the [rmsk2bed command of bedops](https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/rmsk2bed.html).
The file contains 6 columns: Reference contig of haplotype; start coordinate of repeat; end coordinate of repeat; t..., , **Changes after Jul 21, 2025:**
Cleaned up several files that were redundant and added several files pertaining to base composition, satellite DNA analysis, repeatmasker analysis, and the pangene analysis
**Changes after Aug 7, 2025:**
Updated the sj_annotations.tar.gz files, including now a gtf file with the gene names for easier integration with the pangene analysis, as well as the correct RepeatMasker annotation bed file of the AW reference. Also updated the file fasta library used in the RepeatMasker analysis, updating it to AW_365336_combined_repeats_v2.fasta.gz, which includes the satellites found by Satellite Repeat Finder.
创建时间:
2025-11-20



