A New Machine-Learning Tool for Fast Estimation of Liquid Viscosity. Application to Cosmetic Oils
收藏Figshare2020-04-06 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/A_New_Machine-Learning_Tool_for_Fast_Estimation_of_Liquid_Viscosity_Application_to_Cosmetic_Oils/12117975
下载链接
链接失效反馈官方服务:
资源简介:
The viscosities of pure liquids are estimated at 25 °C, from their molecular structures, using three modeling approaches: group contributions, COSMO-RS σ-moment-based neural networks, and graph machines. The last two are machine-learning methods, whereby models are designed and trained from a database of viscosities of 300 molecules at 25 °C. Group contributions and graph machines make use of the 2D-structures only (the SMILES codes of the molecules), while neural networks estimations are based on a set of five descriptors: COSMO-RS σ-moments. For the first time, leave-one-out is used for graph machine selection, and it is shown that it can be replaced with the much faster virtual leave-one-out algorithm. The database covers a wide diversity of chemical structures, namely, alkanes, ethers, esters, ketones, carbonates, acids, alcohols, silanes, and siloxanes, as well as different chemical backbone, i.e., straight, branched, or cyclic chains. A comparison of the viscosities of liquids of an independent set of 22 cosmetic oils shows that the graph machine approach provides the most accurate results given the available data. The results obtained by the neural network based on sigma-moments and by the graph machines can be duplicated easily by using a demonstration tool based on the Docker technology, available for download as explained in the Supporting Information. This demonstration also allows the reader to predict, at 25 °C, the viscosity of any liquid of moderate molecular size (M < 600 Da) that contains C, H, O, or Si atoms, starting either from its SMILES code or from its σ-moments computed with the COSMOtherm software.
本数据集通过三种建模方法,基于分子结构对25℃下纯液体的黏度进行估算:基团贡献法(group contributions)、基于COSMO-RS σ矩的神经网络以及图机器学习模型(graph machines)。后两种方法均为机器学习方法,其模型基于包含300种25℃下液体黏度的数据库完成设计与训练。基团贡献法与图机器学习模型仅使用分子的二维结构(即分子的SMILES编码),而神经网络的黏度估算则基于五类描述符:COSMO-RS σ矩。本研究首次将留一法(leave-one-out)应用于图机器学习模型的模型选择,并证实可采用速度更快的虚拟留一法(virtual leave-one-out)替代该方法。该数据库涵盖了丰富多样的化学结构类型,包括烷烃、醚、酯、酮、碳酸酯、羧酸、醇、硅烷与硅氧烷,同时包含直链、支链、环状等不同化学骨架结构。针对包含22种化妆品油的独立测试集进行黏度对比后发现,在现有数据集条件下,图机器学习模型的预测精度最高。基于σ矩的神经网络与图机器学习模型所得的预测结果,可通过基于Docker技术的演示工具轻松复现,该工具可按补充材料中的说明下载获取。此演示工具还支持用户输入待预测液体的SMILES编码,或通过COSMotherm软件计算得到的σ矩,对任意含C、H、O或Si原子且分子量小于600 Da的中等分子尺寸液体的25℃黏度进行预测。
创建时间:
2020-04-06



