five

Supporting data for: Crystal Structure Generation with Autoregressive Large Language Modeling

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10642387
下载链接
链接失效反馈
官方服务:
资源简介:
The generation of plausible crystal structures is often an important step in the computational prediction of crystal structures from composition. Here, we introduce a methodology for crystal structure generation involving autoregressive large language modeling of the Crystallographic Information File (CIF) format. Our model, CrystaLLM, is trained on a comprehensive dataset of millions of CIF files, and is capable of reliably generating correct CIF syntax and plausible crystal structures for many classes of inorganic compounds. Moreover, we provide general and open access to the model by deploying it as a web application, available to anyone over the internet. Our results indicate that the model promises to be a reliable and efficient tool for both crystallography and materials informatics.

从组分出发通过计算手段预测晶体结构的过程中,生成合理的晶体结构通常是关键环节。本文提出一种晶体结构生成方法,该方法针对晶体信息文件(Crystallographic Information File, CIF)格式开展自回归大语言模型(Large Language Model, LLM)建模。本研究开发的模型CrystaLLM,以包含数百万份CIF文件的大规模综合数据集进行训练,能够可靠生成符合CIF语法规范的合理晶体结构,可覆盖多类无机化合物。此外,我们将该模型部署为网页应用程序,通过互联网向所有用户开放通用访问权限。研究结果表明,该模型有望成为晶体学与材料信息学领域可靠且高效的研究工具。
创建时间:
2024-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作