Whole genome sequences of 23 species from the Drosophila montium species group (Diptera: Drosophilidae): a resource for testing evolutionary hypotheses
收藏DataCite Commons2026-03-11 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.6078/D1CH5R
下载链接
链接失效反馈官方服务:
资源简介:
Large groups of species with well-defined phylogenies are excellent
systems for testing evolutionary hypotheses. In this paper, we describe
the creation of a comparative genomic resource consisting of 23 genomes
from the species-rich Drosophila montium species group, 22 of which are
presented here for the first time. The montium group is uniquely
positioned for comparative studies. Within the montium clade, evolutionary
distances are such that large numbers of sequences can be accurately
aligned while also recovering strong signals of divergence; and the
distance between the montium group and D. melanogaster is short enough so
that orthologous sequence can be readily identified. All genomes were
assembled from a single, small-insert library using MaSuRCA, before going
through an extensive post-assembly pipeline. Estimated genome sizes within
the montium group range from 155 Mb to 223 Mb (mean=196 Mb). The absence
of long-distance information during the assembly process resulted in
fragmented assemblies, with the scaffold NG50s varying widely based on
repeat content and sample heterozygosity (min=18 kb, max=390 kb, mean=74
kb). The total scaffold length for most assemblies is also shorter than
the estimated genome size, typically by 5 - 15 %. However, subsequent
analysis showed that our assemblies are highly complete. Despite large
differences in contiguity, all assemblies contain at least 96 % of known
single-copy Dipteran genes (BUSCOs, n=2,799). Similarly, by aligning our
assemblies to the D. melanogaster genome and remapping coordinates for a
large set of transcriptional enhancers (n=3,457), we showed that each
montium assembly contains orthologs for at least 91 % of D. melanogaster
enhancers. Importantly, the genic and enhancer contents of our assemblies
are comparable to that of far more contiguous Drosophila assemblies. The
alignment of our own D. serrata assembly to a previously published PacBio
D. serrata assembly also showed that our longest scaffolds (up to 1 Mb)
are free of large-scale misassemblies. Our genome assemblies are a
valuable resource that can be used to further resolve the montium group
phylogeny; study the evolution of protein-coding genes and cis-regulatory
sequences; and determine the genetic basis of ecological and behavioral
adaptations.
提供机构:
Dryad
创建时间:
2019-10-14



