Deep sequencing datasets from: Witnessing the structural evolution of an RNA enzyme

NIAID Data Ecosystem2026-03-12 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.c866t1g78

下载链接

链接失效反馈

官方服务：

资源简介：

An RNA polymerase ribozyme that has been the subject of extensive directed evolution efforts has attained the ability to synthesize complex functional RNAs, including a full-length copy of its own evolutionary ancestor. During the course of evolution, the catalytic core of the ribozyme has undergone a major structural rearrangement, resulting in a novel tertiary structural element that lies in close proximity to the active site. Through a combination of site-directed mutagenesis, structural probing, and deep sequencing analysis, the trajectory of evolution was seen to involve the progressive stabilization of the new structure, which provides the basis for improved catalytic activity of the ribozyme. Multiple paths to the new structure were explored by the evolving population, converging upon a common solution. Tertiary structural remodeling of RNA is known to occur in nature, as evidenced by the phylogenetic analysis of extant organisms, but this type of structural innovation had not previously been observed in an experimental setting. Despite prior speculation that the catalytic core of the ribozyme had become trapped in a narrow local fitness optimum, the evolving population has broken through to a new fitness locale, raising the possibility that further improvement of polymerase activity may be achievable. Methods Deep Sequencing of 19 rounds of evolution Sequencing of PCR products obtained after various rounds of evolution was performed at the Yale Center for Genome Analysis on an Illumina NovaSeq 6000, which generated ~20 million paired reads per sample. Reads were trimmed, combined, demultiplexed and filtered using Illumina standard paired end sequencing protocol (#1005063). The sequence datasets were quality filtered (phred >33), and trimmed (>150 nucleotides) using the paired-end read merger program PEAR (v 0.9.11). Individual sequences were enumerated and converted to a fastq file format using a custom Python script (eLife '21 suppfileA). The file sizes were reduced by removing sequences with less than 10 reads for rounds 16 and 31, less than 10,000 reads for round 27, and less than 1,000 reads for all other rounds. The fastq file entries were then aligned using MUSCLE (v 3.8.31). The aligned reads were trimmed to the region encompassing the P7 and P8 stems (nucleotides 9–17 and 83–95) using AliView (v 1.26), then clustered using cd-hit-est (v 4.8.1), with a clustering threshold of 100% identity (-c 1.0), maximum unmatched length of 2 nucleotides (-U 2), and length difference cutoff of 2 nucleotides (-S 2). Clusters with >1% representation in any given round were identified. Resulting tables were manually processed to produce heat map plots and a table of P8 variant percentages through generations. Determination of polymerase fidelity by deep sequencing. The hammerhead and class I ligase ribozymes were synthesized by the 52-2 polymerase under standard reaction conditions (100 nM polymerase, 100 nM template, 80 nM primer, 4 mM each NTP, and 50 or 200 mM MgCl2 at pH 8.3 and 17 °C). For the hammerhead, all partial- and full-length products were collected after the reactions yielded 2% of full-length products using 200 or 50 mM MgCl2 (40 m and 6 h, respectively), and analyzed. For the ligase, only gel-purified full-length ligase from a 24 hour extension reaction with 200 mM MgCl2 (yielding 2% extension to full-length) was analyzed. The products were converted to DNA molecules for Illumina sequencing as described previously (Tjhung et al., 2020). Briefly, RNA products were ligated to the Universal miRNA Cloning Linker using K227Q T4 RNA Ligase 2 and reverse transcribed with Superscript IV using primer Rev2. The resulting cDNA was isolated and tailed with poly(C) using terminal transferase, and then amplified by PCR using Q5 Hot Start High-Fidelity DNA Polymerase and primers Fwd2 and Rev2. For ligase products, cDNA tailing with terminal transferase and PCR amplification were not performed. For both ribozyme products, Illumina adapter sequences were added to the ends of the cDNA using primers Fwd3 or Fwd4 (for hammerhead or ligase, respectively) and Rev3, followed by amplification using the Illumina Nextera Index primers. Sequencing was carried out by the Salk Next Generation Sequencing Core on an Illumina MiniSeq, with either a 75- or 150-cycle paired-end run for the hammerhead or ligase, respectively. The sequence data were processed to categorize all mutations relative to the expected sequence, as described previously (Tjhung et al., 2020). For the ligase ribozyme, an updated method from (Tjhung et al., 2020) was used for the distinct reference sequence: reads were first trimmed using cutadapt v3.4 using parameters --trimmed-only -e 0.25 --pair-filter both -M 120 -n 2 -j 2 –a CTGTAGGCACCATCAATCTGTCTCTTATACACATCTCCGAGCCC -G ATTGATGGTGCCTACAG -A CTGTCTCTTATACACATCTGACGCTGCCGACGA. Sequences without barcodes were filtered out with cutadapt using parameters -g GGAAAAGACAAATCTGCCCT --action none --discard-untrimmed -e 0.05 --pair-filter both -n 1 -j 2. Paired reads were merged using FLASH v1.2.11 with arguments -t 1 -m 50 -M 100 -x 0, and quality filtered using FASTX Toolkit v0.0.14 with –q 36 –p 100. Then bowtie2 v2.4.2 was used to align merged to the template sequence (containing both constant regions) using the following parameters --end-to-end –score-min L,0,-1.2 –rdg 3,5 –rfg 3,5 -L 5 –reorder and a reference sequence "GGAAAAGACAAATCTGCCCTCAGAGCTTGAGAACATCTTCGGATGCAGGGGAGGCAGCCCCCGGTGGCTTTAACGCCAACGTTCTCAACAATAGTGATTTTTTCTGTAGGCACCATCAAT" with constant regions not included in later fidelity measurements underlined. The generated sam file was converted into a sorted indexed bam file using SAMtools v1.9, and edit distances extracted from the bam file to determine the distribution of product Levenshtein distances from the ligase reference sequence. Breseq v0.35.5 bamtoaln was used to create a gapped alignment file (.txt) of the reads (specifying the number of aligned reads with -n 4150646) to the template sequence. A custom java script (Tjhung et al., 2020) was used to calculate the number of matches, mismatches, deletions, and insertions as a function of the template position. The script was compiled after extracting the java files using javac v16.0.2 (Oracle Inc.), and run with java using the output tables from breseq and a template length of 120. Resulting tables were manually processed to produce position-specific plots and analyses. For the hammerhead ribozyme, an identical method was used as previously reported (Tjhung et al., 2020): reads were first trimmed using cutadapt v2.4 using parameters --trimmed-only -e 0.25 --pair-filter both -n 2 -j 2 -a CTGTAGGCACCATCAATCTGTCTCTTATACACATCTCCGAGCCC -G ATTGATGGTGCCTACAG -A CTGTCTCTTATACACATCTGACGCTGCCGACGA. Sequences without barcodes were filtered out with cutadapt -g CTACAGGGCACTCCACAC --action none --discard-untrimmed -e 0.05 --pair-filter both -n 1 -j 2. Paired reads were merged using FLASH v1.2.11 with arguments -t 1 -m 4 -M 48 -x 0.3, and quality filtered using FASTX Toolkit v0.0.13 with -Q33 -q 30 -p 100. Then bowtie2 v2.4.2 was used to align merged to the template sequence (containing both constant regions) using the following parameters --end-to-end -L 5 -reorder and a reference sequence "CTACAGGGCACTCCACACGACGTACTGATGAGGCCGAAAGGCCGAAAAGCGTTTTTTGTCATTGTCCTGTAGGCACCATCAAT" with constant regions not included in later fidelity measurements underlined. The generated sam file was converted into a sorted indexed bam file using SAMtools v1.9. Breseq v0.27.0 bamtoaln was used to create a gapped alignment file (.txt) of the reads (-n 2000000) to the template sequence. A custom java script (Tjhung et al., 2020) was used to calculate the number of matches, mismatches, deletions, and insertions as a function of the template position and the read length class. The script was compiled after extracting the java files using javac v1.7 (Oracle Inc.), and run with java using the output tables from breseq and a template length of 83. Resulting tables were manually processed to produce position-specific and length-specific plots and analyses. Fwd2 GGGGGGATGCTACATG Fwd3 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCTACAGGGCACTCCACAC Fwd4 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAAAAGACAAATCTGCC Rev2 ATTGATGGTGCCTACAG Rev3 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATTGATGGTGCCTACAG Tjhung KF, Shokhirev MN, Horning DP, Joyce GF. 2020. An RNA polymerase ribozyme that synthesizes its own ancestor. Proc. Natl. Acad. Sci. USA 117:2906–2913.

创建时间：

2021-09-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集