QM9数据集子集和TotalEnergies内部数据集
收藏arXiv2023-04-21 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2304.10867v1
下载链接
链接失效反馈官方服务:
资源简介:
本研究使用了两个小型分子数据集:QM9数据集的一个子集,包含4989个分子,以及TotalEnergies内部的一个小数据集,包含516个经过验证的抗氧化剂。QM9数据集子集由所有含有最多九个重原子(C, O, N, F)的分子组成,用于分子生成任务的基准测试。TotalEnergies数据集则反映了工业分子数据集的规模,这些抗氧化剂在多种工业应用和产品中至关重要。研究旨在测试量子生成模型在少量训练样本上的泛化能力,这些样本接近工业场景,即少量但价值高的样本。
This study employs two small-scale molecular datasets: a subset of the QM9 dataset containing 4,989 molecules, and an internal small dataset from TotalEnergies comprising 516 validated antioxidants. This QM9 subset consists of all molecules with up to nine heavy atoms (C, O, N, F) and is used as a benchmark for molecular generation tasks. The TotalEnergies dataset mirrors the scale of industrial molecular datasets, as these antioxidants are critical across a wide range of industrial applications and products. This study aims to test the generalization capability of quantum generative models on small-sized training samples that align with industrial scenarios, i.e., scarce yet high-value samples.
提供机构:
莱顿大学计算机科学研究所
创建时间:
2023-04-21



