A phylogenomic backbone for Acoelomorpha inferred from transcriptomic data
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.nvx0k6f0j
下载链接
链接失效反馈官方服务:
资源简介:
Xenacoelomorpha are mostly microscopic, morphologically simple worms, lacking many structures typical of other bilaterians. Xenacoelomorphs –which include three main groups: Acoela, Nemertodermatida, and Xenoturbella– have been proposed to be an early diverging Bilateria, sister to protostomes and deuterostomes, but other phylogenomic analyses have recovered this clade nested within the deuterostomes, as sister to Ambulacraria. The position of Xenacoelomorpha within the metazoan tree has understandably attracted a lot of attention, overshadowing the study of phylogenetic relationships within this group. Given that Xenoturbella includes only six species whose relationships are well understood, we decided to focus on the most specious Acoelomorpha (Acoela + Nemertodermatida). Here, we have sequenced 29 transcriptomes, doubling the number of sequenced species, to infer a backbone tree for Acoelomorpha based on genomic data. The recovered topology is mostly congruent with previous studies. The most important difference is the recovery of Paratomella as the first off-shoot within Acoela, dramatically changing the reconstruction of the ancestral acoel. Besides, we have detected incongruence between the gene tres and the species tree, likely linked to incomplete lineage sorting, and some signal of introgression between the families Dakuidae and Mecynostomidae, which hampers inferring the correct placement of this family and, particularly, of the genus Notocelis. We have also used this dataset to infer for the first time diversification times within Acoelomorpha, which coincide with known bilaterian diversification and extinction events. Given the importance of morphological data in acoelomorph phylogenetics, we tested several partitions and models. Although morphological data failed to recover a robust phylogeny, phylogenetic placement has proven to be a suitable alternative when a reference phylogeny is available.
Methods
Molecular phylogenomics
A total of 29 transcriptomes were generated from individuals collected between 2007 and 2020, preserved in either RNAlater or RNA Shield and long-term stored at -20ºC. Total RNA was extracted using the Zymo Microprep Quick-RNA kit (Zymo Research) and amplified with the SMARTer Universal Low Input RNA Kit (Takara Bio). The quality of the extractions was ensured with the Bioanalyzer High Sensitivity DNA Analysis and sent to either SciLifeLab or Macrogen for sequencing in an Illumina HiSeq X platform. Three cleaning and assembly strategies were devised to maximise assembly completeness. First, following standard practice, raw reads were cleaned with Trimmomatic and assembled with Trinity 2.9.1 with default parameters. Second, using the TransPi pipeline (version 1.1.0) with three kmer lengths (21, 31, and 41). Third, raw reads were quality-filtered before the Trinity assembly in a three-step process: sequencing errors were corrected with Rcorrector 1.0.4, sequencing adapters were removed with Trimmomatic (as implemented in Trinity 2.9.1), and the reads were quality-filtered with Prinseq 0.20.4, trimming nucleotides under 30 PHRED from both ends and filtering out reads with a mean quality under 20, entropy under 50, and shorter than 40 base pairs. Redundant contigs were removed with EvidentialGene v2019.05.14 and cross-contaminants were filtered with CroCo 1.1 measuring the contig expression with Kallisto 0.46.2. All transcriptomes were assembled following the three pipelines and the best assembly was selected based on its completeness score, measured with BUSCO 3.0.2 and the Metazoa_odb9 database. Finally, coding regions with a minimum length of 300 amino acids were extracted with TransDecoder 5.3.0 and duplicates collapsed (minimum identity 95%, minimum overlap 40 amino acids) with the Dedupe program from BBMap 38.92.
The extracted proteins were assigned to orthogroups with OrthoFinder 2.4.1 and screened for paralogs with PhyloPyPruner 1.2.3 with the following settings: pruning algorithm “Largest Subtree”, keep orthogroups with at least five taxa, trim branches longer than five times the standard deviation of all branch lengths, collapse nodes with nodal support under 60, and, in species-specific duplications, keep the sequences with the shortest pairwise distance to its sister taxa. For the seven species represented by two transcriptomes, only the specimen with the highest number of orthologs was kept. Non-homologous stretches within the sequences were identified and masked with Prequal 1.02, and all sequences shorter than 250 unmasked amino acids were removed. All remaining orthogroups with more than five species were aligned with MAFFT 7.475 using the L-INS-i algorithm. Ambiguously aligned positions, sequences shorter than 66% of the total alignment length, and sites with more than 80% missing data were filtered with BMGE 1.12. The alignments that did not meet the assumptions of stationarity and homogeneity were identified with IQ-TREE 2.1.3 and removed. The resulting dataset included 2774 genes. This dataset was filtered by occupancy, substitution rate, level of saturation, compositional heterogeneity, and average patristic distances to reduce systematic errors. ASTRAL, IQ-TREE (partitioned and site-specific C20 and C60 models), and PhyloBayes were used in phylogenomic inference. MCMCtree was used to infer divergence times.
Morphological phylogenetics
A morphological matrix including all species and up to 44 characters was prepared based on descriptions from the literature and photographs of the specimens analysed. Several partition schemes were tested to maximise the phylogenetic signal of the data. The stepping-stone algorithm implemented in MrBayes 3.2.7 was used to calculate the likelihood of each scheme under the standard discrete model, but applying the ascertainment bias correction and with different model parameters: fixed or variable rates among partitions (APRV), fixed or variable rates among characters (ACRV), and linking or unlinking branch lengths, testing nine models per partition scheme. Finally, the best overall partition scheme and best model configuration per scheme were identified with BayesFactors. MrBayes was used to infer a phylogenetic tree for each partition scheme, applying the best-fit model configuration. For each analysis we ran two independent runs with four Markov chains each for 50 million generations, sampling every 10,000 generations and discarding the first 25% as burn-in. Chain convergence was assessed by ensuring a correct mixing in the log-likelihood plot, that all ESS values were above 200, and that the Potential Scale Reduction Factor was at least one.
Additionally, the ability of these characters to place a set of species in a given tree was tested using the phylogenetic placement algorithm implemented in RAxML 8.2.12. First, morphological characters were weighted in RAxML using the IQ-TREE topology as a guide tree with four gamma categories and applying the Lewis ascertainment bias correction. Then, a morphological matrix with 84 acoel species was downloaded from Jondelius et al. (2011, Systematic Biology) and used to place the species in the reference tree applying the inferred character weights.
创建时间:
2024-10-22



