five

Progressive Cactus alignment of 298 drosophilid species

收藏
DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.x0k6djhrd
下载链接
链接失效反馈
官方服务:
资源简介:
Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. Whole-genome sequence alignments link evolution at the nucleotide level across species and are a critical but computationally intensive step for downstream genomic analyses. Progressive Cactus is a reference-free, whole-genome alignment tool designed to scale to alignments of thousands of species. In the study associated with this dataset, we conducted Oxford Nanopore long-read sequencing of both inbred lines and single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections. We selected a set of 298 suitably high-quality drosophilid genomes from this study, from publicly available genomes assembled previously by us, and genomes assembled by other studies. Repeats were identified and soft-masked in each genome with RepeatModeler2 and RepeatMasker. A guide tree was constructed from 1,000 single-copy orthologs annotated by BUSCO v5 in all genomes. Individual gene trees were inferred with IQTREE2 and a species tree was estimated from the gene trees with ASTRAL-MP. The tree was scaled by the substitution rate at 4-fold degenerate sites and provided to Progressive Cactus as the guide tree for the alignment. Detailed methods are provided in the study. The alignment is released as an open resource and as a tool for studying evolution at the scale of an entire insect family.
提供机构:
Dryad
创建时间:
2023-12-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作