five

Digital Appendix of the dissertation "Accessing PC-SAFT parameters using an integrated Machine Learning framework" (J. Habicht, 2026)

收藏
DataCite Commons2026-03-11 更新2026-05-05 收录
下载链接:
https://data.tu-dortmund.de/citation?persistentId=doi:10.17877/RESOLV-2026-DEAJ6I
下载链接
链接失效反馈
官方服务:
资源简介:
This publication contains the digital appendix to the RESOLV dissertation "Accessing PC-SAFT parameters using an integrated Machine Learning framework" (J. Habicht, 2026) at the TU Dortmund University. <br> <br> <b>Project description:</b> <br> Exploring new, efficient, and environmentally friendly chemical or biotechnological processes is one of the major challenges of modern industry. Precise process simulation techniques established as a cornerstone for this purpose providing fast reliable results in all stages of process development. Those simulators are either guided by key experiments or are used to assist experimental work and significantly reduce the material consumption. Within the last decades, the Perturbed-Chain Statistical Associating Fluid Theory (PC-SAFT) emerged as reliable thermodynamic theory to provide the physical base for such simulators in various fields including reaction engineering, design of separation techniques, or the design of pharmaceutical formulations. <br> Although providing excellent modeling results, the application of PC-SAFT for calculating complex mixtures requires fitting the pure-component parameter sets for all molecules involved, as well as determining the binary interaction parameters for all binary combinations of these molecules, which is requiring a substantial experimental effort. Especially in the development of pharmaceutical processes, minimizing material consumption is a major cost factor. Thus, the development of predictive tools to obtain the PC-SAFT parameters for any molecular system of interest is an important challenge to ensure thermodynamic-based calculations even with minimal experimental input. <br> Therefore, an integrated Machine Learning (ML) framework using neural network-ensembles was developed to predict the PC-SAFT pure-component parameters and binary interaction parameters of mixtures in this work. Particular focus was put on the flexibility of the approach with regard to already existing parameters and to potentially available experimental data. To meet this requirement, the ML framework was designed with a two-level approach, while the first level is used to predict PC-SAFT pure-component parameters from the molecular structure of the pure components and the second level is used to predict the binary interaction parameter using the two PC-SAFT pure-component parameters of the respective binary system as input. <br> This approach was successfully trained and validated for various examples including the prediction of physical pure component properties (vapor pressure, liquid density) using the first level and the prediction of phase equilibria (Vapor-liquid-equilibrium, Solid-liquid-equilibrium, Liquid-liquid-equilibrium) using the complete ML framework. <br> <br> <b>Dataset description:</b> <br> The dataset contains eight subdatasets DA_1-DA_8, as described below. <br> DA_1: <br> Contains the calculations (training and test set) of PC-SAFT pure-component parameter sets for non-associating molecules using three different Neural Networks (NN1-NN3) trained in this work compared to the parameters published in literature (fitted to experimental data). The initial bit length of the Extended-Connectivity fingerprints used as input was varied for the three Neural Networks (NN1: 2e10, NN2: 2e11, NN3:2e12). Columns E-G contain the literature values (no new generated data), all other columns contain newly generated data of this work. <br> <br> DA_2: <br> Contains the dataset of PC-SAFT pure-component parameter sets retrieved from literature or fitted to experimental data in this work for associating and non-associating molecules. For all molecules having association interactions, the 2B association scheme is used. No new data is shown in this dataset. <br> <br> DA_3: <br> Contains the details of the dataset used for the training of Neural Networks to predict the PC-SAFT binary interaction parameter (kij) using PC-SAFT pure-component parameter sets as input. kij values from the literature are given alongside with the corresponding publications as well as the corresponding pure-component parameter sets for the respective binary systems. If a kij value has been fitted in this work, the corresponding publication providing the VLE data used for parameter regression is given. If "this work" is referenced for a kij value in "kij_Lit_dataset.csv", new data is presented based on a Regression of the kij to the corresponding Vapor-liquid-equilibrium (VLE), otherwise no new data is shown. <br> <br> DA_4: <br> Contains the predictions of the PC-SAFT binary interaction Parameter made with three different Neural Network Ensembles trained in this work. The Neural Network Ensembles have been varied by applying three different strategies to account for the symmetry of the binary interaction Parameter. NNe0: direct Training using a Standard loss function; NNe1: Standard training with a symmetric dataset; NNe2: Adjustment of the Loss function to force symmetry + using a symmetric dataset. Except column G "kij_Lit", this dataset presents newly generated data of this work. <br> <br> DA_5: <br> Contains the Errors of VLE calculations for 100 binary Systems compared to experimental data using the ML Framework developed in this work. <br> Scenario 1: Literature derived PC-SAFT pure-component parameter sets and ML-predicted kij values <br> Scenario 2: One literature derived and one ML-predicted PC-SAFT pure-component parameter set and ML-predicted kij values <br> Scenario 3: Fully ML-predicted parameters (pure-component parameter sets and binary interaction parameter) <br> Scenario 4: Literature derived PC-SAFT pure-component parameter sets and the kij value set to zero <br> All calculations are newly generated data of this work. <br> <br> DA_6: <br> Contains the predictions of Active pharmaceutical ingredient (API) solubility made with the ML Framework developed in this work and the respective literature values and related publications to this data. <br> Columns D and G Show literature data (respective source given at the end), all other columns present newly generated data. <br> <br> DA_7: <br> Contains the minima in AARDmean for discrete radii given for the complete dataset of PC-SAFT pure-component Parameter sets in the relative Parameter space (as defined in this work). The absolute values of AARDmean are given with the literature Parameter sets, the two minimal Parameter sets and the corresponding spherical angles and cartesian coordinates. <br> Columns O-Q Show existing literature data. All other columns present newly generated data of this work. <br> <br> DA_8: <br> Contains the Performances of the three different loss functions, which have been used to Train neural network Ensembles in this work to predict PC-SAFT pure-component Parameter sets in this work. The prediction of the Parameter sets ("..._PC-SAFT-Parameters_...") is given for all loss functions for the Training and test set as well as the corresponding Errors ("..._pvt_...") in the calculation of physical properties (vapor pressures, saturated liquid densities). The Errors were obtained by comparing the calculations of physical properties using the ML-predicted Parameter sets vs. the Parameter sets published in literature. <br> In the files of type "..._PC-SAFT-Parameters_...", columns F-H refer to literature data, all other columns Show newly generated data of this work. All other files Show newly generated data of this work.
提供机构:
TUDOdata
创建时间:
2026-02-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作