Progressive Cactus alignment of 298 drosophilid species
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.x0k6djhrd
下载链接
链接失效反馈官方服务:
资源简介:
Long-read sequencing is driving rapid progress in genome assembly across
all major groups of life, including species of the family Drosophilidae, a
longtime model system for genetics, genomics, and evolution. Whole-genome
sequence alignments link evolution at the nucleotide level across species
and are a critical but computationally intensive step for downstream
genomic analyses. Progressive Cactus is a reference-free, whole-genome
alignment tool designed to scale to alignments of thousands of species. In
the study associated with this dataset, we conducted Oxford Nanopore
long-read sequencing of both inbred lines and single wild flies obtained
either directly from the field or from ethanol-preserved specimens in
museum collections. We selected a set of 298 suitably high-quality
drosophilid genomes from this study, from publicly available genomes
assembled previously by us, and genomes assembled by other studies.
Repeats were identified and soft-masked in each genome with RepeatModeler2
and RepeatMasker. A guide tree was constructed from 1,000 single-copy
orthologs annotated by BUSCO v5 in all genomes. Individual gene trees were
inferred with IQTREE2 and a species tree was estimated from the gene trees
with ASTRAL-MP. The tree was scaled by the substitution rate at 4-fold
degenerate sites and provided to Progressive Cactus as the guide tree for
the alignment. Detailed methods are provided in the study. The alignment
is released as an open resource and as a tool for studying evolution at
the scale of an entire insect family.
提供机构:
Dryad
创建时间:
2023-12-01



