In vivo functional phenotypes from a computational epistatic model of evolution
收藏DataONE2024-01-17 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:0b2559e204e469eb1595d78f441fcf549ab22c98113536f0bf02e9d843a30318
下载链接
链接失效反馈官方服务:
资源简介:
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways, and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo beta-lactamase activity in E. coli TEM-1 variants. These variants retain family-like functionality while being more active than their WT predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evo..., Direct coupling analysis methods used were mean field (https://github.com/morcoslab/SEEC-nt) or Boltzman machine learning (https://github.com/matteofigliuzzi/bmDCA)
SI Dataset S4 (PhaseI_MSA.fasta)Phase I multiple sequence alignment used for SEEC-AA mfDCA and bmDCA statistical inference was obtained from Pfam and pre-processed to remove sequences with more than 5% consecuitve gaps.Â
SI Dataset S5 (PhaseII_MSA.fasta)Phase II multiple sequence alignment used for SEEC-NT mfDCA and bmDCA statistical inference was generated using HMMTools with TEM-1 sequence as seed (excluding signal petide) and default parameters.
SI Dataset S6 (PhaseII_mfDCA_Parameters.mat)
1. PhaseII_mfDCA_eij (size=5523x5523)2. PhaseII_mfDCA_hi (size=21x263)Coupling and local field matrices inferred using mean field DCA with the PhaseII MSA as the input. This file is a .mat readable in Matlab and compatible with the code found at Github (github.com/morcoslab/SEEC-NT)
SI Dataset S7 (PhaseII_bmDCA_Parameters.mat)
1. eij (..., .mat files must be opened using Matlab
Sanger sequencing data can be viewed using 4peaks (https://nucleobytes.com/4peaks/), Benchling (https://www.benchling.com/), or a number of other platforms.
Fasta files can be read using Matlab, bioPython or any Multiple Sequence Alignment visualization software. , # Data from: In vivo functional phenotypes from a computational epistatic model of evolution
---
This dataset includes sequence data, model parameters, similation trajectories and experimental data for Sanger sequecing related to a model of sequence evolution called Sequence Evolution with Epistatic Contributions (SEEC) applied to beta-lactamase TEM-1.
## Description of the data and file structure
SI Dataset S4 (PhaseI_MSA.fasta)Phase I multiple sequence alignment used for SEEC-AA mfDCA and bmDCA statistical inference was obtained from Pfam and pre-processed to remove sequences with more than 5% consecuitve gaps.
SI Dataset S5 (PhaseII_MSA.fasta)Phase II multiple sequence alignment used for SEEC-NT mfDCA and bmDCA statistical inference was generated using HMMTools with TEM-1 sequence as seed (excluding signal petide) and default parameters.
SI Dataset S6 (PhaseII_mfDCA_Parameters.mat)Coupling and local field matrices inferred using mean field DCA with the PhaseII MSA as the input...
创建时间:
2025-07-26



