five

In vivo functional phenotypes from a computational epistatic model of evolution

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.n5tb2rc1c
下载链接
链接失效反馈
官方服务:
资源简介:
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways, and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo beta-lactamase activity in E. coli TEM-1 variants.  These variants retain family-like functionality while being more active than their WT predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. In this dataset, we include input datasets, simulation trajectories as well as experimental data to support the publication: "In vivo functional phenotypes from a computationa epistatic model of evolution". Methods Direct coupling analysis methods used were mean field (https://github.com/morcoslab/SEEC-nt) or Boltzman machine learning (https://github.com/matteofigliuzzi/bmDCA) SI Dataset S4 (PhaseI_MSA.fasta) Phase I multiple sequence alignment used for SEEC-AA mfDCA and bmDCA statistical inference was obtained from Pfam and pre-processed to remove sequences with more than 5% consecuitve gaps.  SI Dataset S5 (PhaseII_MSA.fasta) Phase II multiple sequence alignment used for SEEC-NT mfDCA and bmDCA statistical inference was generated using HMMTools with TEM-1 sequence as seed  (excluding signal petide) and default parameters. SI Dataset S6 (PhaseII_mfDCA_Parameters.mat) 1. PhaseII_mfDCA_eij (size=5523x5523) 2. PhaseII_mfDCA_hi (size=21x263) Coupling and local field matrices inferred using mean field DCA with the PhaseII MSA as the input. This file is a .mat readable in Matlab and compatible with the code found at Github (github.com/morcoslab/SEEC-NT) SI Dataset S7 (PhaseII_bmDCA_Parameters.mat) 1. eij (size=5523x5523) 2. hi (size=21x263) Coupling (eij) and local field (hi) matrices inferred using Boltzmann machine learning DCA with the PhaseII MSA as the input.  The eij matrices have been converted into the format that matches the output of mfDCA. This file is a .mat readable in Matlab and compatible with the code found at Github (github.com/morcoslab/SEEC-NT) SI Dataset S8 (SEEC_nt_sequence_trajectories.mat) Variables: 1. SEEC_nt_bmDCA_Trajectory_amino_T0_75_3 (size=5000x263) 2. SEEC_nt_mfDCA_Trajectory_amino_T1_5_1 (size=5000x263) Sequences output from SEEC-nt used for variant selection. A .mat file readable in Matlab.  SI Dataset S9 (Figure_3_Sanger_Sequencing_Data.zip),SI Dataset S10 (Figure_4_Sanger_Sequencing_Data.zip) Raw Sanger Sequencing chromatograms collected from plasmid samples isolated from assay cultures.  Naming of chromatograms is as follows: first number refers to the batch of sequencing second number is the sample run within that batch "for" or "rev" refers to the sequencing forward or reverse directions, respectively. The rest of the name comes from the variant name as used in the manuscript, where Beg, Mid or Late refer to positions in the simulation trajectory, bm or mf refer to the DCA implementation used to infer the coupling and local field parameters, the number is the variant number, and NT indicates the algorithm used was SEEC-nucleotide. Sanger sequencing Chromatograms can be viewed using free software such as 4peaks (https://nucleobytes.com/4peaks/), Benchling (https://www.benchling.com/), or a number of other platforms.
创建时间:
2024-01-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作