PI1M: A Benchmark Database for Polymer Informatics

Name: PI1M: A Benchmark Database for Polymer Informatics
Creator: figshare
Published: 2025-06-01 06:29:33
License: 暂无描述

DataCite Commons2025-06-01 更新2024-07-28 收录

下载链接：

https://figshare.com/articles/PI1M_A_Benchmark_Database_for_Polymer_Informatics/12483473/1

下载链接

链接失效反馈

官方服务：

资源简介：

Open source data in large scale are the cornerstones for data-driven research, but they are not readily available for polymers. In this work, we build a benchmark database, called PI1M (referring to ~1 million polymers for polymer informatics), to provide data resources that can be used for machine learning research in polymer informatics. A generative model is trained on ~12,000 polymers manually collected from the largest existing polymer database PolyInfo, and then the model is used to generate ~1 million polymers. A new representation for polymers, polymer embedding (PE), is introduced, which is then used to perform several polymer informatics regression tasks for density, glass transition temperature, melting temperature and dielectric constants. By comparing the PE trained by the PolyInfo data and that by the PI1M data, we conclude that the PI1M database covers similar chemical space as PolyInfo, but significantly populate regions where PolyInfo data are sparse. We believe PI1M will serve as a good benchmark database for future research in polymer informatics.

大规模开源数据是数据驱动研究的核心基石，但高分子领域的相关数据却难以获取。本研究构建了一款名为PI1M（指代用于高分子信息学（polymer informatics）研究的约100万条高分子数据集）的基准数据库，旨在为高分子信息学领域的机器学习研究提供数据资源支撑。研究团队首先从现有规模最大的高分子数据库PolyInfo中手动采集约12000条高分子数据，以此训练生成式模型，随后利用该模型生成了约100万条高分子数据。本研究针对高分子提出了一种全新的表征方式——高分子嵌入（polymer embedding，PE），并利用该表征开展了多项针对高分子密度、玻璃化转变温度、熔融温度与介电常数的回归任务。通过对比基于PolyInfo数据训练得到的PE与基于PI1M数据训练得到的PE，本研究得出结论：PI1M数据库与PolyInfo数据库覆盖的化学空间相近，但显著填补了PolyInfo数据较为匮乏的区域。我们相信PI1M将成为未来高分子信息学研究的优质基准数据库。

提供机构：

figshare

创建时间：

2020-06-15

搜集汇总

数据集介绍