Homospermidine synthase evolution and the origin(s) of pyrrolizidine alkaloids in Apocynaceae

NIAID Data Ecosystem2026-05-10 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.1c59zw3z2

下载链接

链接失效反馈

官方服务：

资源简介：

Premise: Enzymes encoded by paralogous genes producing identical specialized metabolites in distantly related plant lineages are strong evidence of parallel phenotypic evolution. Inference of phenotypic homology for metabolites produced by orthologous genes is less straightforward, since orthologs may be recruited in parallel into novel pathways. In prior research on pyrrolizidine alkaloids (PAs), specialized metabolites of Apocynaceae, the evolution of homospermidine synthase (HSS), an enzyme of PA biosynthesis, was reconstructed and a single origin of PAs inferred because HSS enzymes of all known PA-producing Apocynaceae species are orthologous and descended from an ancestral enzyme with the motif (VXXXD) of an optimized HSS. The Methods: We increased sampling, tested the effect of amino acid motif on HSS function, revisited motif evolution, and tested for selection to infer evolution of HSS function and its correlation with phenotype. Results: Some evidence supports a single origin of PAs: an IXXXD HSS-like gene, similar in function to VXXXD HSS, evolved in the shared ancestor of all PA-producing species; loss of HSS function occurred multiple times via pseudogenization and perhaps via evolution of an IXXXN motif. Other evidence indicates multiple origins: the VXXXD motif, highly correlated with the PA phenotype, evolved two or four times independently; the ancestral IXXXD gene was not under positive selection, while some VXXXD genes were; and substitutions at sites experiencing positive selection occurred on multiple branches in the HSS-like gene tree. Conclusions: The complexity of the genotype-function-phenotype map confounds the inference of PA homology from HSS-like gene evolution in Apocynaceae. Methods Taxon sampling forDHS/HSS sequencing One hundred seventy (170) accessions of 159 Apocynaceae species (Appendix S1: Table S2), including 10 known PA-producing species were sampled (Appendix S1: Table S2). The outgroup was Gelsemium sempervirens (Gelsemiaceae) (Antonelli et al., 2021). Previously published Apocynaceae DHS and HSS sequences from 25 species were also included (Livshultz et al., 2018). Calotropis genome query DHS- and HSS-like genes were extracted from the Calotropis gigantea genome (Hoopes et al., 2018) via tblastx searches with exon sequences of Parsonsia alboflavescens DHS and HSS (MG817648.1, MG817649.1) in Geneious Prime v.2020.0.3 (https://www.geneious.com). DNA extraction, library preparation, targeted enrichment, sequencing DNA extraction, library preparation, targeted enrichment, and paired-end sequencing were previously described by Straub et al. (2020). Probes were designed based on the Apocynaceae DHS and HSS sequences published by Livshultz et al. (2018), manually trimmed to not more than 200 bp beyond the 5¢ and 3¢ ends of the first and last exons. A total of 2707 probes targeted DHS and HSS (Straub et al., 2020). A few contaminated libraries (i.e., libraries containing DNA from more than one sample) were identified by mapping a sample’s reads to its own assembled plastome. Any samples with evidence of two divergent plastomes or divergent sequences across multiple single-copy nuclear loci were considered contaminated and excluded (unpublished data). Contig assembly Raw paired-end sequence reads were trimmed using Trimmomatic default settings (slidingwindow:10:20, minlen:40) (Bolger et al., 2014). Using the first stage of the MyBaits pipeline [BLASTN (Cameron and Williams, 2007), SPAdes (Bankevich et al., 2012)] with default options (SPAdes: k 21,33,55,77, cov-cutoff off, phred-offset 33) (Moore et al., 2018), trimmed sequence reads for each sample were binned as DHS/HSS by BLASTN using reference transcriptomic DHS-like** sequences (Appendix S1: Table S2), and binned reads were assembled using SPAdes. Libraries that did not have an assembled contig** with >20´ coverage were removed from analysis (Bentley et al., 2008; Dohm et al., 2008; Harismendy et al., 2009; Whittall et al., 2010; Straub et al., 2012). Retained SPAdes contigs were extended and fused by afin, an assembly finishing program (-s 50, -l 100) (McKain and Wilson, 2017). Exon annotation Retained afin contigs were annotated for exonic regions in Geneious Prime v.2020.0.3 (https://www.geneious.com) using DHS and HSS exons** from Parsonsia alboflavescens (GenBank accessions MG817648.1, MG817649.1) (50% identical threshold). Annotated contigs (exon and intron) were aligned within a sample to identify and manually annotate partial or divergent exons that did not meet the identity threshold. Alignment construction All alignments were constructed as nucleotide alignments using MAFFT v7.450 (default settings except gap penalty=3) (Katoh et al., 2002; Katoh and Standley, 2013) in Geneious Prime v.2020.0.3 (https://www.geneious.com). Maximum likelihood tree construction Maximum likelihood trees were constructed with rapid bootstrapping and partitioned by codon position in RAxML-HPC v.8 on XSEDE (v8.2.12) (GTR+GAMMA, 1000 bootstrap replicates) (Stamatakis, 2014) in CIPRES Science Gateway (v3.3) (Miller et al., 2010). Gene assembly from contigs Annotated exons were extracted and aligned with an existing Apocynaceae sequence alignment (Livshultz et al., 2018) and Calotropis gigantea sequences (Hoopes et al., 2018). A maximum likelihood tree was built from this “initial alignment” (Appendix S1: Tables S2, S3; Appendix S2). Contigs orthologous to Parsonsia alboflavescens HSS were considered HSS-like, those outside this clade, DHS-like. Contigs derived from the same library were then concatenated using the following algorithm. After excluding contigs with non-terminal stop codons, a strict consensus was calculated from overlapping partial contigs that were in a clade with other DHS-like or HSS-like sequences from the same tribe in the initial alignment gene tree (Appendix S2) from the same library. Amino acid similarity in the region of overlap of the combined contigs ranged from 89.6–100% (Appendix S1: Table S5). Then non-overlapping contigs were concatenated using the same grouping criteria. Polyphyletic contig pairs were grouped if there were no sequences from closely related taxa for them to cluster with, and there was no evidence of contamination in the source library. Validation of gene assemblies Sequence assemblies for samples that had Sanger sequences of the same locus from Livshultz et al. (2018) were validated via pairwise alignment and calculation of divergence. Here and otherwise, divergence/pairwise amino acid sequence identity was calculated in Geneious Prime v.2020.0.3 (https://www.geneious.com) using in-frame nucleotide alignments; this calculation excludes missing data. Construction of matrices Sanger sequences >95% identical to new sequences from the same sample were removed from the alignment. The “full data set” consisted of potential pseudogene sequences, full (minimum exons 2–6) DHS-like and HSS-like sequences, short DHS-like and HSS-like contigs (from samples in which an entire DHS-like or HSS-like gene had also been assembled by SPAdes/afin), and non-redundant Sanger sequences from Livshultz et al. (2018) (Appendix 1: Tables S2, S3). This full data set alignment was trimmed to produce a “reduced data set” alignment used for ancestral state reconstruction and selection analyses. Potential pseudogenes and sequences that did not span at minimum exons 2–6 were removed from this data set, and the remaining sequences were realigned. Thirty-seven base pairs were trimmed from the 5¢ end and 67 bp from the 3¢ end of this reduced data set alignment because these areas were highly variable. Gene tree construction Three gene trees were produced. A full data set tree was constructed that contains all contigs (candidate pseudogenes, short contigs, and full/concatenated/consensus contigs) (Appendix S3). A reduced data set tree contained only contigs with at least exons 2-6 and no non-terminal stop codons (Figures 3A, 3B, 4A; Appendix S4). Lastly, a third tree was constructed using the reduced data set alignment 100% constrained to the full data set tree topology, referred to as the “full data set topology” (Figures 3C, 3D, B; Appendix S5). Tests for recombination Marsdenieae HSS-like paralogs were investigated further using GARD (Kosakovsky Pond et al., 2006). GARD searches an alignment for a maximum number of breakpoints, builds phylogenies for every non-recombinant contig, and assesses those phylogenies using the Akaike information criterion (AIC). Potentially recombinant contigs were split at potential recombination breakpoints indicated by GARD and a Marsdenieae HSS-like gene tree (outgroup: Tassadia propinqua HSS-like gene) was rebuilt using maximum likelihood tree construction criteria described above. Shimodaira–Hasegawa tests To test alternate topologies of Apocynaceae DHS-/HSS-like gene trees and test the monophyly of IXXXN and IXXXD paralogs in Marsdenieae, Shimodaira–Hasegawa (SH) tests in RAxML-HPC2 on XSEDE (v8.2.12) [SS7] [TL8] (Stamatakis, 2014) as implemented on the CIPRES Science Gateway (Miller et al. 2010) were performed. The SH test rejects or fails to reject a null hypothesis of equal support for two given topologies. Ancestral sequence reconstruction Ancestral sequences were reconstructed using codeml Model M0 (default options except: model=0, NSsites=0, RateAncestor=1, cleandata=0) in PAML v4.9j (Yang, 1997, 2007). codeML integrates the Goldman and Yang model of amino acid substitution and assumes that selection pressure on an individual site is the same for every branch, produces joint likelihood reconstructions (all ancestral nodes reconstructed), and uses empirical Bayes procedure (Yang and Wang, 1995) for sequence reconstruction. Additionally, codeml calculates a marginal reconstruction (single nodes reconstructed), which includes posterior probabilities for reconstructed amino acids. Tests for selection: hypotheses If the origin of pyrrolizidine alkaloids was adaptive, and if selection for PAs caused adaptive evolution of HSS, we can make testable predictions about patterns of selection on HSS-like genes in Apocynaceae. If there was a single origin of PAs in the MRCA of all PA-producing taxa (Figure 3A, C), we predict positive selection on the HSS-like gene of this MRCA (Figure 3A, C, branch A, w > 1), followed by purifying selection on branches that retained optimized HSS function (i.e., IXXXD motif in clade L or VXXXD motif in clade G) and relaxed selection on branches that lost optimized HSS function (i.e., evolution of IXXXN motif, Figure 3A, C, clades I, J, w = 1, k < 1). We also tested whether the evolution of an IXXXD motif (Figure 3A, clade K, w = 1, k < 1) in two PA-free Alafia species from an ancestral VXXXD motif (Figure 3A, clade G) is a result of loss of function. (While PAs have been reported from Alafia, the two sequenced species tested negative for PAs [Barny et al., 2021].]) In contrast, if parallel recruitment of the ancestral HSS-like paralog led to multiple origins of the PA biosynthetic pathway (Figure 3B, D), we predict relaxed selection (due to loss of DHS function) on the ancestral HSS-like gene (Figure 3B, D, branch A, w = 1), followed by positive selection on the ancestral HSS of each lineage where the HSS VXXXD motif (present in all sequenced PA-producing Apocynaceae species) and/or PA-production evolved (Figure 3B, D, branches B, C, D, E, F, G, to Isonema, to Strophanthus w > 1). Under the multiple origin scenario, branches with IXXXN and IXXXD motifs should remain under relaxed selection (if they did not evolve some new function); we also predict relaxation of selection in the HSS clade (clade A, k < 1) relative to the DHS clade (clade H), since most HSS branches would be nonfunctional and under relaxed selection (Figure 3B, D), while DHS is an essential gene that should always be under strong purifying selection. Tests for selection: analyses [A]BSREL (adaptive branch-site random effects likelihood) (Smith et al., 2015) was used to test for positive selection on selected branches (Figure 3, ω > 1). It fits optimal ω (dN/dS ratio) distributions to each branch by assigning each site to one of up to three ω rate categories, which generates the optimal ω distribution for each branch. For branches with more than one ω rate, the larger one is mapped in Figure 4, Appendices S4 and S5. Positive selection is inferred on a priori selected branches by comparing, via likelihood ratio test (LRT), the optimized ω distribution to a null model with constraint ω < 1 for all sites. MEME (Mixed Effects Model of Evolution) (Murrell et al., 2012) was used to identify sites under positive selection on pre-specified branches (Figure 3, ω > 1). It allows selection to vary both among branches and among sites. Each branch is assigned one of two ω rate classes at each amino acid site. A single α (synonymous substitution rate) is shared among all branches. First, the nonsynonymous substitution rate (b[TL9] -) is estimated for each site; β- is constrained to be less than or equal to α (i.e., evolving neutrally). Second, the nonsynonymous substitution rate (b+[TL10] ) is unconstrained in the full model. Likelihood ratio tests are used to compare the full model with a null model where β+ is constrained to be less than α. RELAX (Wertheim et al., 2015) was used to test for neutral evolution. It tests for relaxation and intensification of positive and purifying selection on pre-specified test branches (Figure 3A, C, lineages K, I, J) compared to a set of designated reference branches (Figure 3A, C, lineages G, L). The RELAX null model assigns all sites into one of three rate classes (ω1 = purifying, ω2 = neutral, ω3 = positive selection). The full (alternative) model introduces a selection intensity parameter, k, and raises ωk on the test branches. The null model constrains k = 1, which forces the same ω distribution in both the test and reference branch sets. If the likelihood ratio tests find the alternative model is significantly better fit, a value of k >1 is considered evidence of intensified selection and a k <1, relaxed selection in the test branches relative to the designated reference branches. The RELAX general descriptive model was used to calculate the k values mapped in Figure 4, Appendices S4 and S5, from RELAX analyses comparing all HSS-like branches to DHS-like branches (Appendix S1: Table S12.4). Rather than using the a priori test and reference branch sets, the general descriptive model fits the three ω rates to all branches, and an individual k for each branch (Wertheim et al., 2015). Comparison with human DHS The reduced data set was aligned with the angiosperm DHS/HSS alignment from Livshultz et al. (2018) and human DHS (GenBank: P49366) to produce a “human and plant DHS/HSS alignment” (Appendix S1: Table S3). Human DHS amino acid site functions were described by Wator et al. (2020). The amino acid positions of DHS monomer interaction sites and functional sites (e.g., active site tunnel entrance) were taken from annotations on structure PDB ID 6XXM (the crystal structure of human DHS complexed with putrescine; Wator et al., 2020) using the NCBI Structure feature (Madej et al., 2014) in iCN3D v.2.24.4 (Wang et al., 2020). These sites were manually annotated on the human DHS** sequence in the alignment to enable comparison between human and plant DHS/HSS** amino acid positions. Site-directed mutagenesis Parsonsia alboflavescens HSS The open reading frame of Parsonsia alboflavescens HSS (PaHSS), cloned in an expression vector (NovagenTM pET28a, Millipore Sigma, Billerica, MA, USA) with an artificial N-terminal hexahistidine (6xHis) tag extension, was used as template for site-directed mutagenesis guided by Liu and Naismith (2008). Primer pairs to introduce the single mutations V269 to I269 (numbering of the amino acids follows that of Kaltenegger et al., 2013) and D273 to N273 as well as to double mutation V269XXXD273 to I269XXXN273 are given in Appendix S1, Table S4. PCRs with 12 amplification cycles were performed in a 25-µL reaction mixture with Phusion High-Fidelity DNA Polymerase (ThermoFisher Scientific, Waltham, Massachusetts, USA) according to the manufacturer’s instructions; annealing temperature is given in Appendix S1: Table S4. The PCR products were treated with restriction enzyme DpnI [BEH11] [TL12] (ThermoFisher Scientific, Waltham, MA, USA) at 37°C for 1 h, diluted with water (1:10), subsequently propagated in Escherichia coli TOP10 (ThermoFisher Scientific), and sent out for Sanger sequencing (MWG Eurofins Genomics, Ebersberg, Germany) to identify successful mutants. Heterologous expression, purification, and activity assays of P. alboflavescens HSS and mutants The complete ORF of the PaHSS and the mutant variants were expressed in Escherichia coli BL21(DE3) and purified as described by Ober and Hartmann (1999a). Protein purification was monitored via SDS-PAGE analysis, and protein quantities were estimated based on UV absorption at 280 nm [TL13] and the specific extinction of the respective protein, calculated with the PROTPARAM web tool in ExPASy (Gasteiger et al., 2005) and with the Bradford method (Bradford, 1976). The oligomerization state of the purified proteins was analyzed by size exclusion chromatography coupled to UV. Eight to 15 µg of affinity-purified DHS and HSS in borate buffer (~42 kDa) were analyzed on an analytical size-exclusion chromatography (SEC) column (MabPac Sec-1, 5 µm 300 Å, 4 ´ 150 mm) equilibrated with 50 mM phosphate buffer (pH 6.8) plus 0.3 M NaCl (0.2 mL/min flow), connected to an UltiMate 3000 system and a DAD-3000 diode array detector (ThermoFisher Scientific). Proteins were monitored at 280 nm. Cytochrome c[BEH14] [TL15] (12 kDa) and BSA (monomer 66.5 kDa, dimer 132 kDa) were used as reference proteins. For biochemical characterization, the purified proteins were concentrated and suspended in borate-based (50 mM borate-NaOH buffer, pH 9) assay buffer, which included the additives DTT (1 mM) and EDTA (0.1 mM). The in vitro assays were performed as described by Kaltenegger et al. (2021). In short, 5–40 µg purified recombinant protein were incubated with putrescine and spermidine (400 µM each) in the presence of NAD (2 mM) in borate-based assay buffer to determine the enzyme’s ability to produce homospermidine. Product formation was quantified via derivatizing the reaction mixture with 9‑fluorenylmethyl chloroformate (FMOC, Sigma) and subsequent analyses by HPLC coupled with UV detection. To detect the enzymes’ ability to utilize the eIF5A, assays were hydrolyzed as described by Kaltenegger, et al. (2021) derivatized with FMOC and analyzed by HPLC coupled with FLD to quantify deoxyhypusine, 1,3-diaminopropane, and canavalmine.

创建时间：

2026-01-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集