Data from: Graphs in phylogenetic comparative analysis: Anscombe’s quartet revisited

DataONE2018-07-16 更新2024-06-08 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

1. In 1973 the statistician Francis Anscombe used a clever set of bivariate datasets (now known as Anscombe’s quartet) to illustrate the importance of graphing data as a component of statistical analyses. In his example, each of the four datasets yielded identical regression coefficients and model fits, and yet when visualized revealed strikingly different patterns of covariation between x and y. 2. Phylogenetic comparative methods are statistical methods too, yet visualizing the data and phylogeny in a sensible way that would permit us to detect unexpected patterns or unanticipated deviations from model assumptions is not a routine component of phylogenetic comparative analyses. 3. Here, we use a quartet of phylogenetic datasets to illustrate that the same estimated parameters and model fits can be obtained from data that were generated using markedly different procedures – including pure Brownian motion evolution and randomly selected data uncorrelated with the tree. Just as in the case of Anscombe’s quartet, when graphed the differences between the four datasets are quickly revealed. 4. The intent of this article is to help build the general case that phylogenetic comparative methods are statistical methods and consequently that graphing or visualization should invariably be included as an essential step in our standard data analytical pipelines. 5. Phylogenies are complex data structures and thus visualizing data on trees in a meaningful and useful way is a challenging endeavor. We recommend that the development of graphical methods for simultaneously visualizing data and tree should continue to be an important goal in phylogenetic comparative biology.

1. 1973年，统计学家弗朗西斯·安斯康姆（Francis Anscombe）借助一组精巧的双变量数据集（bivariate datasets）——即如今所称的安斯康姆四重数据集（Anscombe’s quartet）——论证了将数据可视化作为统计分析组成部分的重要性。在该示例中，四个数据集的回归系数与模型拟合度均完全一致，但通过可视化手段却能揭示出x与y之间差异显著的协变模式。 2. 系统发育比较方法（phylogenetic comparative methods）同样属于统计方法范畴，然而，以合理方式可视化数据与系统发育树（phylogeny），从而得以检测意外模式或模型假设的未预期偏差，却并非系统发育比较分析的常规操作环节。 3. 本文中，我们利用一组系统发育四重数据集，阐明了即便采用显著不同的数据生成流程——包括纯布朗运动进化（pure Brownian motion evolution）以及与系统发育树完全不相关的随机生成数据——也能得到相同的估计参数与模型拟合度。正如安斯康姆四重数据集的案例，将这四个数据集可视化后，其差异便会清晰显现。 4. 本文旨在传递一项核心论点：系统发育比较方法属于统计方法，因此可视化理应作为标准数据分析流程中不可或缺的关键步骤。 5. 系统发育树属于复杂的数据结构，因此以兼具学术意义与实用价值的方式在树状结构上可视化数据，是一项颇具挑战性的工作。我们认为，开发可同时可视化数据与系统发育树的图形方法，应当继续成为系统发育比较生物学领域的重要研究目标。

创建时间：

2018-07-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集