PI1M: A Benchmark Database for Polymer Informatics

Name: PI1M: A Benchmark Database for Polymer Informatics
Creator: figshare
Published: 2020-08-25 12:07:10
License: 暂无描述

DataCite Commons2020-08-25 更新2024-07-28 收录

下载链接：

https://figshare.com/articles/PI1M_A_Benchmark_Database_for_Polymer_Informatics/12483473

下载链接

链接失效反馈

官方服务：

资源简介：

Open source data in large scale are the cornerstones for data-driven research, but they are not readily available for polymers. In this work, we build a benchmark database, called PI1M (referring to ~1 million polymers for polymer informatics), to provide data resources that can be used for machine learning research in polymer informatics. A generative model is trained on ~12,000 polymers manually collected from the largest existing polymer database PolyInfo, and then the model is used to generate ~1 million polymers. A new representation for polymers, polymer embedding (PE), is introduced, which is then used to perform several polymer informatics regression tasks for density, glass transition temperature, melting temperature and dielectric constants. By comparing the PE trained by the PolyInfo data and that by the PI1M data, we conclude that the PI1M database covers similar chemical space as PolyInfo, but significantly populate regions where PolyInfo data are sparse. We believe PI1M will serve as a good benchmark database for future research in polymer informatics.

大规模开源数据是数据驱动研究的核心基石，但面向聚合物领域的此类数据却难以获取。本研究构建了一款名为PI1M（指代面向聚合物信息学（polymer informatics）的约100万条聚合物数据）的基准数据库，旨在为聚合物信息学领域的机器学习研究提供标准化数据资源。研究团队首先从现有规模最大的聚合物数据库PolyInfo中手动采集约12000条聚合物数据，以此训练生成式模型，随后借助该模型生成了约100万条聚合物数据。本研究提出了一种全新的聚合物表征方式——聚合物嵌入（polymer embedding, PE），并基于该表征完成了多项聚合物信息学回归任务，涵盖密度、玻璃化转变温度、熔融温度及介电常数四个物性指标。通过对比基于PolyInfo数据训练得到的PE与基于PI1M数据训练得到的PE，本研究证实：PI1M数据库与PolyInfo数据库覆盖的化学空间高度相似，但显著填补了PolyInfo数据较为匮乏的化学区域。我们相信PI1M将可为未来聚合物信息学领域的研究提供优质的基准数据库支撑。

提供机构：

figshare

创建时间：

2020-06-15

搜集汇总

数据集介绍