Data from: Stitch or cluster? A comparison of alternative phylogenomic dataset assembly strategies for blenny fish
收藏DataCite Commons2026-04-27 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.gb5mkkx4g
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenomics has revolutionized the way we infer evolutionary
relationships. Several bioinformatic pipelines have been developed for
assembling phylogenomic datasets, in which orthology inference is a key
step. Here, we compared two alternative strategies for assembling
phylogenomic datasets: sequence clustering (OrthoFinder) and a new
similarity-based approach that enriches a predefined set of loci
(Patchwork). We downloaded publicly available genomic data for the fish
family Blenniidae, which comprises a heterogeneous set of source data
(genome skimming, transcriptomes, genomes) obtained by various sequencing
technologies (Illumina short reads, long Nanopore reads, 454
pyrosequencing). These data are characterized by diverse levels of
sequencing depth, read length, and per-base accuracy, representing a
typical scenario of data reuse for phylogenomic purposes. All data types,
regardless of accuracy and sequencing depth, were suitable to
phylogenetically place species, even at estimated sequencing depths ~1.6x,
but 454 data produced extremely long branches. For assembling our
phylogenomic datasets, Patchwork outperformed OrthoFinder because it
generated fewer but taxonomically more complete multiple sequence
alignments. Our study is the first to test the evolutionary relationships
among combtooth blenny fish with genome-scale data, which was previously
studied with multi-locus datasets. We also explore two alternative
approaches to combine marker-rich phylogenomic data with taxonomically
broad multi-locus markers obtained by Sanger sequencing, supporting that
simple data concatenation does not necessarily outperform phylogenomic
constraints on multi-locus datasets. This dataset contains all files
relevant to our study, including multiple sequence alignments and
phylogenomics results.
提供机构:
Dryad
创建时间:
2026-04-24



