five

Kinetic modeling of enzymatic cephalexin synthesis with neural ODEs and surrogate-accelerated Bayesian inference

收藏
DataCite Commons2025-12-09 更新2026-05-07 收录
下载链接:
https://darus.uni-stuttgart.de/citation?persistentId=doi:10.18419/DARUS-5539
下载链接
链接失效反馈
官方服务:
资源简介:
<p>α-Amino ester hydrolases (AEHs) offer a promising route to the stereoselective synthesis of β-lactams such as cephalexin. However, published kinetic studies have encountered difficulty when extended beyond fitting of the data, indicating practical non-identifiability of the underlying kinetic models. Here, we address this issue using Bayesian inference combined with a reaction-consistent neural ODE surrogate that substantially accelerates parameter estimation. This framework enables efficient development of complex enzyme kinetic models even on limited hardware while providing rigorous uncertainty quantification of all parameters. To account for batch-dependent differences in active enzyme concentration, it was treated as a free parameter in each time series. Using this approach, the number of kinetic parameters was reduced from 12 to 9, and a useful kinetic model was obtained which is identifiable, mechanistically consistent, and predictive even under high substrate conditions.</p> <h2 id="available-models">Available Models</h2> <ul> <li><p><code>models/model_04.json</code>: The most comprehensive 12-parameter model including all major reaction pathways, competitive inhibition, substrate inhibition, and detailed enzyme regulation mechanisms. This model provides the most biologically detailed description but requires the most parameters to be estimated.</p></li> <li><p><code>models/model_06.json</code>: A streamlined 9-parameter model that simplifies some regulatory interactions while maintaining core kinetic behavior. This represents a good compromise between detail and parameter identifiability.</p></li> <li><p><code>models/model_07.json</code>: An intermediate 10-parameter model that includes additional regulatory terms compared to Model 06, capturing more complex enzyme behavior under varying substrate conditions.</p></li> <li><p><code>models/model_08.json</code>: An optimized 9-parameter model that balances predictive accuracy with parameter parsimony. This model was developed through systematic model reduction to retain essential kinetic features while minimizing parameter uncertainty.</p></li> <li><p><code>models/model_04_no_e0.json</code>: Identical to Model 04 but with fixed enzyme concentration (E₀) rather than estimating it from data. Use this when enzyme concentration is known or measured separately.</p></li> <li><p><code>models/model_08_no_e0.json</code>: Identical to Model 08 but with fixed enzyme concentration. This provides a direct comparison of modeling approaches with and without enzyme concentration estimation.</p></li> </ul> <h3 id="model-file-structure-and-components">Model File Structure and Components</h3> <p>Each model file (JSON format) contains a complete mathematical description of the kinetic system:</p> <ul> <li><strong>Species definitions</strong>: Lists all chemical species with their names and symbolic identifiers used in equations</li> <li><strong>Constants</strong>: Fixed parameters like enzyme concentration (p0) that may be estimated or held constant</li> <li><strong>ODEs</strong>: The system of ordinary differential equations describing how each species concentration changes over time. These equations encode the reaction kinetics and mass balances.</li> <li><strong>Parameters</strong>: Adjustable kinetic parameters (rate constants, binding affinities, inhibition constants) with their prior distributions for Bayesian inference</li> <li><strong>Algebraic assignments</strong>: Complex mathematical expressions that define reaction rates, enzyme-substrate complexes, and regulatory terms as functions of the parameters and species concentrations</li> </ul> <p>The models use symbolic mathematics where enzyme-substrate complexes and reaction rates are expressed algebraically, making them both interpretable and computationally efficient.</p> <h2 id="system-requirements">System Requirements</h2> <h2 id="software-dependencies">Software Dependencies</h2> <p>The analysis pipeline requires several specialized Python packages for scientific computing, probabilistic programming, and machine learning:</p> <div class="sourceCode" id="cb1"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip</span> install catalax</span></code></pre></div> <h3 id="hardware-requirements">Hardware Requirements</h3> <p>The computational analysis is moderately demanding due to Bayesian MCMC sampling and neural network training:</p> <ul> <li><strong>CPU</strong>: Multi-core processor (recommended: 12+ cores) - MCMC chains run in parallel across available cores for efficient sampling</li> <li><strong>RAM</strong>: 16GB minimum, 32GB recommended - Memory requirements peak during MCMC sampling when storing large arrays of posterior samples</li> </ul> <h3 id="operating-system-and-python-version">Operating System and Python Version</h3> <ul> <li><strong>Supported OS</strong>: Linux or macOS (primary testing on macOS)</li> <li><strong>Python version</strong>: 3.10 or higher required for compatibility with JAX and NumPyro</li> <li><strong>Shell</strong>: Bash-compatible shell for running analysis scripts</li> </ul> <h2 id="how-to-reproduce">How to Reproduce</h2> <h3 id="quick-start">Quick Start</h3> <ol type="1"> <li><strong>Install dependencies</strong>:</li> </ol> <div class="sourceCode" id="cb2"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip</span> install catalax</span></code></pre></div> <ol start="2" type="1"> <li><strong>Train the neural ODE surrogate</strong>:</li> </ol> <div class="sourceCode" id="cb3"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ex">jupyter</span> notebook TrainNeuralODE.ipynb</span> <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Run all cells to create trained/rateflowode.eqx</span></span></code></pre></div> <ol start="3" type="1"> <li><strong>Run the complete analysis</strong>:</li> </ol> <div class="sourceCode" id="cb4"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="bu">export</span> <span class="va">XLA_FLAGS</span><span class="op">=</span><span class="st">"--xla_force_host_platform_device_count=12"</span> <span class="co"># Adjust number for your CPU cores</span></span> <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="fu">chmod</span> +x fit_all.sh</span> <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="ex">./fit_all.sh</span></span></code></pre></div> <h3 id="what-this-does">What This Does</h3> <p>The analysis pipeline:</p> <ul> <li>Uses Bayesian inference (MCMC) to estimate kinetic parameters with uncertainty quantification</li> <li>Compares multiple model complexities (Models 04, 06, 07, 08)</li> <li>Treats enzyme concentration as a free parameter in each experiment</li> <li>Generates diagnostic plots and statistical summaries</li> <li>Saves all results to the <code>results/</code> directory</li> </ul> <h3 id="individual-model-analysis">Individual Model Analysis</h3> <p>To analyze just one model:</p> <div class="sourceCode" id="cb5"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ex">python</span> run_inference.py models/model_08.json</span></code></pre></div> <p>For analysis without enzyme concentration estimation:</p> <div class="sourceCode" id="cb6"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ex">python</span> run_inference.py models/model_08_no_e0.json <span class="at">--no-e0</span></span></code></pre></div> <h2 id="outputs">Outputs</h2> <h3 id="statistical-results-files">Statistical Results Files</h3> <p>These files contain the quantitative outcomes of the parameter estimation and model evaluation:</p> <ul> <li><p><code>{model_name}_summary.csv</code>: Comprehensive MCMC parameter statistics including posterior means, standard deviations, 95% credible intervals, effective sample sizes (ESS), and R-hat convergence diagnostics. This file provides the key numerical results for parameter interpretation.</p></li> <li><p><code>{model_name}_samples.nc</code>: Complete posterior distribution samples stored in NetCDF format. Contains 10,000 samples × 12 chains for each parameter, enabling detailed uncertainty analysis, prediction intervals, and further statistical computations.</p></li> <li><p><code>{model_name}_metrics.json</code>: Model performance metrics including various error measures (L1, L2 losses), coefficient of determination (R²), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). These metrics allow comparison of model quality and complexity.</p></li> <li><p><code>{model_name}_mean_e0.npy</code>: Estimated enzyme concentrations for each experimental measurement (when E₀ estimation is enabled). This file contains the posterior mean enzyme concentrations that can be used for subsequent analyses or experimental validation.</p></li> </ul> <h3 id="visualization-outputs-plots-subdirectory">Visualization Outputs (<code>plots/</code> subdirectory)</h3> <p>Diagnostic and result plots for model assessment and interpretation:</p> <ul> <li><strong>Trace plots</strong>: Time series of MCMC samples for each parameter, allowing visual inspection of mixing and convergence</li> <li><strong>Corner plots</strong>: Two-dimensional projections of parameter correlations and marginal distributions</li> <li><strong>Posterior distributions</strong>: Histograms and density plots showing parameter uncertainty</li> <li><strong>Model fit plots</strong>: Comparison of model predictions vs. experimental data over time</li> <li><strong>MCMC diagnostics</strong>: Monte Carlo Standard Error (MCSE) and Effective Sample Size (ESS) plots to assess sampling quality</li> </ul> <h3 id="fitted-model-files-models-subdirectory">Fitted Model Files (<code>models/</code> subdirectory)</h3> <p>Updated model definitions with estimated parameters:</p> <ul> <li><p><code>{model_name}_bi.json</code>: Model with parameters set to Bayesian posterior means. This represents the most probable parameter values given the data and priors, suitable for point predictions and further analysis.</p></li> <li><p><code>{model_name}_fitted.json</code>: Model with parameters optimized using deterministic methods. These parameters minimize prediction errors and are typically used for the best-fit model predictions.</p></li> </ul>
提供机构:
DaRUS
创建时间:
2025-11-17
二维码
社区交流群
二维码
科研交流群
商业服务