MaterialBrain: High-Performance Material Synthesis Extraction via Human–AI-Curated Few-Shot Large Language Models
收藏Figshare2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/MaterialBrain_High-Performance_Material_Synthesis_Extraction_via_Human_AI-Curated_Few-Shot_Large_Language_Models/30951534
下载链接
链接失效反馈官方服务:
资源简介:
The extraction of the metal–organic framework synthesis route from the literature has been crucial for the rational MOFs design with desirable functionality. The recent advent of large language models (LLMs) provides a disruptive new solution to this long-standing problem. While the latest research on chemical data extraction mostly adopts zero-shot LLMs lacking specialized material knowledge or fine-tuned LLMs inducing high cost and inflexibility in our scenario, we introduce in this work the MaterialBrain pipeline that optimizes the few-shot LLMs in-context learning technique to accurately extract synthesis routes and design high-performance materials. First, a batch–epoch–iteration-based human–AI data curation approach is proposed to optimize both the quantity and quality of annotation database for the synthesis extraction task, which are pivotal to MaterialBrain’s performance. Second, an information retrieval algorithm is applied to pick and quantify a few-shot demonstrations from the annotation database for each extraction. Over three data sets randomly sampled from nearly 90,000 well-defined MOFs, we conduct triple evaluations to validate our pipeline. The synthesis extraction, structure inference, and material design performance of MaterialBrain significantly outplay zero-shot LLMs and baseline methods. The specific surface area of the lab-synthesized material guided by LLMs surpasses that of 99.2% of MOFs of the same class reported in the literature.



