A new Lower Permian ray-finned fish (Actinopterygii) from South Dakota and the use of tree space to find rogue taxa in phylogenetic analysis of morphological data

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.2bvq83bwf

下载链接

链接失效反馈

官方服务：

资源简介：

The divergence of extant lineages from the “palaeoniscoids”, a grade of Paleozoic and early Mesozoic Era species, remains unresolved in analyses of morphological data despite more than four decades of phylogenetic research. We describe a new ray-finned fish, Tenupiscis dakotaensis gen et. sp. nov., from the Lower Permian (Kungurian) of South Dakota to strengthen the phylogenetic framework of Mississippian–Triassic actinopterygians. Our initial parsimony and Bayesian phylogenetic analyses were unable to resolve the relationships of Mississippian–Triassic “palaeoniscoids”. We analyzed the topological variation among the trees sampled in each phylogenetic search (tree space) to determine if uncertainty was concentrated in a small subset of species with highly uncertain phylogenetic relationships relative to other terminal taxa (rogue taxa) or distributed evenly amongst early actinopterygians. The relationships of fourteen species were unresolved in the parsimony strict consensus due to a single rogue taxon (“Kalops monophyrum”). Parsimony and Bayesian analyses with the rogue pruned or recoded find the initially unresolved Mississippian–Triassic “palaeoniscoids” (including Tenupiscis) branching from the actinopterygian stem or from the base of Pan-Neopterygii . Our work supports the emerging consensus that Paleozoic Era ray-finned fishes therefore, include clades of stem actinopterygians and the earliest members of the actinopterygian crown group. We also demonstrate that tree space methods can effectively identify and mitigate rogue taxon effects in phylogenetic analysis of morphological data from new fossil taxa. Methods The input data in the form of phylogenetic trees were generated in a series of analyses (see "Parsimony analyses" and "Bayesian analyses" below). The matrices that we ran in the phylogenetics programs to make the trees are also provided (see "matrix construction" below). We used R to summarize the results of the Bayesian searches as majority rules consensus trees, visualized the variation in the samples of phylogenetic trees ("Tree space visualization), and determined if any rogue taxa were present in our dataset ("Rogue taxon search"). Summary of experimental efforts underlying this dataset: This document describes a series of phylogenetic analyses of ray-finned fishes (actinopterygians) on the basis of a dataset of morphological data derived from Stack and Gottfried (2022). A full description of the methods is provided in the draft manuscript and supplementary information documents, but we will provide a summary of relevant information for how the data were generated and all information needed to replicate the analyses. Matrix construction: We coded the new taxon for 222 discrete morphological characters using the matrix of Stack & Gottfried (2022), which incorporates coding changes from Argyriou et al. (2018) and Coates & Tietjen (2019), adds the early Permian actinopterygian Concentrilepis minnekahtaensis, and reduces the taxon list to focus on actinopterygian interrelationships (Stack & Gottfried, 2022). The full list of changes is available in the supplementary material of Stack and Gottfried (2022). The full matrix used in our initial analyses contains 10341 scorings for 75 taxa; this matrix and analyses using it are labeled Initial below. We also used a matrix where the scorings for the terminal taxon "Kalops monophyrum" (nomen nudum, meaning a name that is not linked to a described taxon) were removed, referred to below as "Removed". The third and final version of the matrix replaces "Kalops monophyrum" with Kalops monophrys (Poplin & Lund, 2002), and is labeled as "Recoded". All parsimony and Bayesian analyses followed identical steps described below; the only difference between analyses was the matrices used. We used the open-access Morphobank (O'Leary and Kaufman 2011) to annotate and output all matrix files. Parsimony analyses: All parsimony analyses were conducted in TNT 1.5 (Goloboff et al., 2008; Goloboff & Catalano, 2016) and implemented an initial New Technology Search with a combination of the Sectorial Search, Ratchet, Tree Fusing, and Drift algorithms to find the optimal tree length 500 times (random seed =1; Goloboff et al., 2008). We conducted a subsequent traditional search with Tree Bisection and Reconnection on the topologies returned from the New Technology search. Our strict consensus tree summarizes the agreement between the most parsimonious trees from the traditional search (Nixon & Carpenter, 1996). We mapped unambiguous synapomorphies from the most parsimonious trees onto the strict consensus for each analysis in TNT. We also calculated Bremer support values in TNT by conducting Tree Bisection and Reconnection on the most parsimonious trees and allowing the analysis to retain all trees 1-6 steps longer than the optimal length. We calculated the consistency index (CI; Kluge and Farris, 1969) and retention index (RI; Farris, 1989) of the strict consensus in TNT with the stats.run command. Bayesian analyses: All Bayesian phylogenetic analyses were conducted in MrBayes 3.7.2a using two independent Metropolis-coupled Markov chain Monte Carlo analyses with the MkV model for discrete morphological data (Ronquist et al., 2012; Lewis, 2001). Each Metropolis-coupled Markov chain Monte Carlo analysis had four independent Markov chains that ran for an initial 500,000 iterations, with burn-in set to 25% and sampling every 100 generations. We ran 4.5 million generations prior to reaching a standard deviation of split frequencies of 0.008282, with the minimum Effective Sample Size (ESS; Ripley, 1987) exceeding 6000 and the Potential Scale Reduction Factor (PSRF; Gelman & Rubin, 1992) values equaling 1.0. We also used the “plot” command in MrBayes to examine the trend in sampled log-likelihood values to ensure that they are randomly distributed within the space between generations 1,125,000 and 4,500,000, indicating that the chains converged on a stable region of the posterior distribution. We generated majority rule consensus trees in MrBayes with the “sumt Burninfrac=0.5” command. We imported the nexus formatted consensus tree into R (R Core Team, 2021) to generate a more flexible annotated consensus tree with the ape (Paradis & Schliep, 2019), phytools (Revell, 2012), and phylotate (Beer & Beer, 2019) packages, which we opened in Figtree v1.4.4 (Rambaut, 2018). Tree space visualization: We applied a series of “tree space” techniques for visualizing variation in phylogenetic searches (Smith, 2022; Wright & Lloyd, 2020) to examine the variation in our tree searches and determine the source of any low resolution in our consensus topologies. We conducted three parallel studies of tree space in R (R Core Team, 2021), with the cluster (Maechler et al., 2022), TreeTools (Smith, 2019), TreeDist (Smith, 2020), vioplot (Adler & Kelly, 2022), ape (Paradis & Schliep, 2019), and protoclust (Bien & Tibshirani, 2022) packages. Our analyses are inspired by vignettes by Martin R. Smith (https://github.com/ms609/TreeDist/blob/HEAD/vignettes/treespace.Rmd; https://ms609.github.io/TreeDist/dev/articles/compare-treesets.html). We examined the most parsimonious trees and 1000 randomly sampled Bayesian trees on their own in addition to a separate analysis of the Bayesian and most parsimonious trees together. A full script and the files needed to recreate these analyses in R are provided in the Supplementary Data. We calculated the distance between trees via the clustering information distance metric, which Smith (2020) demonstrated to be the most consistent measure of tree dissimilarity among available metrics. See Smith (2020) for detailed comparisons and rigorous testing of measures of tree distance. We performed a principal coordinates analysis (PCoA or metric multidimensional scaling; Gower, 1966) of each tree sample to create a twelve-dimensional mapping of the distances between the topologies. We calculated the product of the trustworthiness and continuity (TxC; Kaski et al., 2003; Venna & Kaski, 2001) of mappings in 1-12 dimensions to determine how many dimensions were needed to reliably visually represent the distances between the topologies in each tree sample. The trustworthiness measures the degree to which proximities in the original distance matrix are preserved (Kaski et al., 2003), whereas continuity measures to what degree points that are nearby in the original matrix maintain proximity in the mapping (Smith, 2022; Venna & Kaski, 2001). We mapped each tree space with the number of dimensions needed to meet or surpass a TxC of 0.9, following the recommendation of Smith (2022). We searched for clustering in each tree distance matrix via Partitioning Around Medoids (PAM; Kaufman & Rousseeuw, 1990; with algorithmic improvements from Schubert & Rousseeuw, 2021) and hierarchical clustering with minimax linkage (Hierarchical; Ao et al., 2005; Bien & Tibshirani, 2011) algorithms. We calculated the silhouette coefficient (Kaufman & Rousseeuw, 1990) to evaluate the reliability of the 2-12 clustering structures identified by each algorithm. The silhouette coefficient is a dimensionless measure of the degree to which objects in a cluster are close to other objects in their cluster relative to objects in the closest neighboring cluster (Kaufman & Rousseeuw, 1990). We further evaluated potential clustering by calculating and visualizing the dispersal of each tree sample, which is the distance between each tree and the respective median tree, to further understand the geometry of their respective tree spaces (Smith, 2022). The median tree has the shortest average distance from each other tree in the set (Smith, 2022). Examining the spread of the tree samples about their median allowed us to verify the landscapes shown in the initial tree space analyses. We visualized dispersal between and within the most parsimonious trees and Bayesian tree sample using violin plots (Adler & Kelly, 2022) and density plots, based on a vignette by Tom Kelly, https://cran.r-project.org/web/packages/vioplot/vignettes/violin_area.html Rogue taxon search: In this context, rogue taxa are species with highly uncertain phylogenetic position relative to other species in the same analysis (Smith, 2021). We aimed to determine if any of these unstable taxa acted as rogues in our analysis by conducting a rogue taxon search with the R (R Core Team, 2021) package Rogue (Smith, 2021) on the most parsimonious trees and a sample of 1000 trees randomly sampled from the first run of the Bayesian analysis (accounting for a burn-in of 50%) with the QuickRogue function. We chose to use the Quickrogue function because it can identify rogues as reliably as alternative heuristics in Rogue and RogueNaRok (Aberer et al., 2013) with the benefit of lower computation time (Smith, 2021). The Rogue output shows the splitwise phylogenetic information content (the sum of the information content contained in the bipartitions of a topology; Smith, 2021) of the baseline majority rule consensus of the tree sample and the rawImprovement, which shows the change in phylogenetic information content for the removal of each rogue taxon. We compared the rawImprovement scores of each rogue to determine how much damage they caused relative to each other. We conducted follow-up maximum parsimony searches in TNT 1.5 with the sole rogue taxon (“Kalops monophyrum”) identified in the initial maximum parsimony analysis removed from the matrix. “Kalops monophyrum” is not one of the two described species of Kalops (Poplin & Lund, 2002) and is therefore a nomen nudum. Given the rogue behavior of this taxon, we opted to remove “Kalops monophyrum” and re-score Kalops based on personal examination of the type specimen of Kalops monophrys (Poplin & Lund, 2002; CM 27372) and the original description (Poplin & Lund, 2002). The rationale for each character coding change is provided in Part C of the Supplementary Information. We conducted an additional maximum parsimony analysis with “Kalops monophyrum” pruned from the matrix, along with a maximum parsimony analysis and Bayesian search with Kalops monophrys subbed in for “Kalops monophyrum”. These searches used identical phylogenetic search and tree space methods to the initial analyses. References: Aberer A.J., Krompass D., Stamatakis A. 2013. Pruning rogue taxa improves phylogenetic accuracy: An efficient algorithm and webservice. Systematic biology. 62(1):162–166. Adler D. Kelly, S.T. 2022. vioplot: violin plot. R package version 0.4.0 https://github.com/TomKellyGenetics/vioplot Argyriou T., Giles S., Friedman M., Romano C., Kogan I., Sánchez-Villagra M.R. 2018. Internal cranial anatomy of Early Triassic species of †Saurichthys (Actinopterygii: †Saurichthyiformes): Implications for the phylogenetic placement of †saurichthyiforms. BMC Evolutionary Biology. 18:1–41. Bien J., Tibshirani R. 2011. Hierarchical clustering with prototypes via minimax linkage. Journal of the American Statistical Association. 106:1075–1084. Bien J, Tibshirani R. 2022. Protoclust: Hierarchical clustering with prototypes. https://cranr-projectorg/web/packages/protoclust/indexhtml. Coates M.I., Tietjen K. 2019. ‘This strange little palaeoniscid': A new early actinopterygian genus, and commentary on pectoral fin conditions and function. Earth and Environmental Science Transactions of The Royal Society of Edinburgh. 109(1–2):15–31. Farris J.S. 1989. The retention index and the rescaled consistency index. Cladistics: the international journal of the Willi Hennig Society. 5(4):417–419. Gelman A., Rubin D.B. 1992. Inference from iterative simulation using multiple sequences. Statistical Science. 7(4):457-472. Goloboff P.A., Farris J.S., Nixon K.C. 2008. TNT, a free program for phylogenetic analysis. Cladistics. 24(5):774–786. Goloboff PA, Catalano SA. 2016. TNT version 1.5, including a full implementation of phylogenetic morphometrics. Cladistics. 32(3):221–238. Gower JC. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 53(3-4):325-338. Kaski S., Nikkilä J., Oja M., Venna J., Törönen P., Castrén E. 2003. Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics. 4:1–13. Kaufman L., Rousseeuw P.J. 1990. Partitioning around medoids (program PAM). Finding groups in data: An introduction to cluster analysis. Hoboken, New Jersey: John Wiley & Sons, Ltd. Kluge A.G., Farris J.S. 1969. Quantitative phyletics and the evolution of anurans. Systematic Biology. 18(1):1-32. Lewis P.O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic biology. 50(6):913–925. Maechler M., Rousseeuw P., Struyf A., Hubert M., Hornik K. 2022. Cluster: Cluster analysis basics and extensions. R package version 2.1.3 2022. https://CRAN.R-project.org/package=cluster. Nixon K.C., Carpenter J.M. 1996. On consensus, collapsibility, and clade concordance. Cladistics. 12(4):305–321. O’Leary MA, Kaufman S. 2011. Morphobank: Phylophenomics in the “cloud”. Cladistics. 27(5):529-537. Ripley B.D. 2009. Stochastic simulation. John Wiley & Sons. Paradis E, Schliep K. 2019. Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 35:526–528. Poplin C.M., Lund R. 2002. Two Carboniferous fine-eyed palaeoniscoids (Pisces, Actinopterygii) from Bear Gulch (USA). Journal of Paleontology. 76:1014–1028. R (R Core Team 2021) is needed to run the code in Stacketal_SupplementCode. We recommend using Morphobank (Morphobank.org; O'Leary and Kaufman 2011) to open and read phylogenetic matrix files, although can be opened as text files. Re-running the phylogenetic analyses would require TNT 1.5 (parsimony; Goloboff et al., 2008; Goloboff & Catalano, 2016) and MrBayes 3.7.2a (Ronquist et al., 2012). Rambaut A. 2018. Figtree tree figure drawing tool version 1.4.4. https://githubcom/rambaut/figtree/releases. Ronquist F., Teslenko M., Van Der Mark P., Ayres D.L., Darling A., Höhna S., Larget B., Liu L., Suchard M.A., Huelsenbeck J.P. 2012. MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology. 61(3):539–542. Schubert E., Rousseeuw P.J. 2021. Fast and eager k-medoids clustering: O (k) runtime improvement of the pam, clara, and clarans algorithms. Information Systems. 101:101804. Smith M. 2019. Treetools: Create, modify and analyse phylogenetic trees. Comprehensive R Archive Network. doi:10.5281/zenodo.3522725. Smith MR. 2020. Information theoretic generalized Robinson–Foulds metrics for comparing phylogenetic trees. Bioinformatics. 36(20):5007–5013. Smith M.R. 2020. Treedist: Distances between phylogenetic trees. R package version 2.4.0. 2020. doi: 10.5281/zenodo.3528124. Smith M.R. 2021. Using information theory to detect rogue taxa and improve consensus trees. Systematic Biology 0:1–7. Smith M.R. 2022. Robust analysis of phylogenetic tree space. Systematic Biology. 0(syab099):1–16. Stack J, Gottfried M.D. 2022. A new, exceptionally well-preserved Permian actinopterygian fish from the Minnekahta Limestone of South Dakota, USA. Journal of Systematic Palaeontology. 19:1271–1302. Venna J., Kaski S. 2001. Neighborhood preservation in nonlinear projection methods: An experimental study. In: Dorffner G, Bischof H, Hornik K. editors. Artificial neural networks — ICANN 2001. Lecture notes in computer science. Berlin: Springer. p. 485–492. Wright A.M., Lloyd, G.T. 2020. Bayesian analyses in phylogenetic palaeontology: interpreting the posterior sample. Palaeontology 1–10.

创建时间：

2025-07-14