Data from: Some limitations of public sequence data for phylogenetic inference (in plants)
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.450qq
下载链接
链接失效反馈官方服务:
资源简介:
The GenBank database contains essentially all of the nucleotide sequence
data generated for published molecular systematic studies, but for the
majority of taxa these data remain sparse. GenBank has value for
phylogenetic methods that leverage data–mining and rapidly improving
computational methods, but the limits imposed by the sparse structure of
the data are not well understood. Here we present a tree representing
13,093 land plant genera—an estimated 80% of extant plant diversity—to
illustrate the potential of public sequence data for broad phylogenetic
inference in plants, and we explore the limits to inference imposed by the
structure of these data using theoretical foundations from phylogenetic
data decisiveness. We find that despite very high levels of missing data
(over 96%), the present data retain the potential to inform over 86.3% of
all possible phylogenetic relationships. Most of these relationships,
however, are informed by small amounts of data—approximately half are
informed by fewer than four loci, and more than 99% are informed by fewer
than fifteen. We also apply an information theoretic measure of branch
support to assess the strength of phylogenetic signal in the data,
revealing many poorly supported branches concentrated near the tips of the
tree, where data are sparse and the limiting effects of this sparseness
are stronger. We argue that limits to phylogenetic inference and signal
imposed by low data coverage may pose significant challenges for
comprehensive phylogenetic inference at the species level. Computational
requirements provide additional limits for large reconstructions, but
these may be overcome by methodological advances, whereas insufficient
data coverage can only be remedied by additional sampling effort. We
conclude that public databases have exceptional value for modern
systematics and evolutionary biology, and that a continued emphasis on
expanding taxonomic and genomic coverage will play a critical role in
developing these resources to their full potential.
提供机构:
Dryad
创建时间:
2014-05-16



