Data for "Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models"
收藏DataCite Commons2024-05-10 更新2024-08-26 收录
下载链接:
https://figshare.com/articles/dataset/Data_for_Flexible_Model-Agnostic_Method_for_Materials_Data_Extraction_from_Text_Using_General_Purpose_Language_Models_/21861948
下载链接
链接失效反馈官方服务:
资源简介:
Datasets for the paper entitled "Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models" by Maciej P. Polak, Shrey Modi, Anna Latosinska, Jinming Zhang, Ching-Wen Wang, Shanonan Wang, Ayan Deep Hazra, and Dane Morgan<br><br>MPPolak_BulkModulus_ValidationData.xlsx - a dataset of bulk modulus sentences, positive - containing bulk modulus data, and negative - not contaning data, used for model assessment.<br>MPPolak_BulkModulus_AllTrainData.xlsx - a dataset of bulk modulus sentences, positive - containing bulk modulus data, and negative - not contaning data, used for fine tuning of the model and model assessment.<br>MPPolak_CritCoolRate_Dataset.xlsx - a dataset of critical cooling rates for metallic glasses developed in this paper with the ,ethod presented in the paper, consisting of names of materials, values of critical cooling rates, their units, and DOIs of the source documents.<br><br>MPPolak_DataExtraction_codes.zip - simple example codes necessary to reproduce the results. The provided 'positive' and 'negative' files are a shortened versions of the training data allowing for quick execution and testing. The 'pos' and 'neg' files contain full testing sets. The 'plotting' directory contains data and scripts which allow to reproduce the figures.<br>
本数据集对应论文《基于通用大语言模型(Large Language Model)的灵活且与模型无关的文本材料数据提取方法》,作者为Maciej P. Polak、Shrey Modi、Anna Latosinska、Jinming Zhang、Ching-Wen Wang、Shanonan Wang、Ayan Deep Hazra与Dane Morgan。<br><br>MPPolak_BulkModulus_ValidationData.xlsx:该数据集包含体积模量(bulk modulus)相关语句,其中正样本包含体积模量数据,负样本不包含相关数据,用于模型评估。<br><br>MPPolak_BulkModulus_AllTrainData.xlsx:该数据集包含体积模量相关语句,其中正样本包含体积模量数据,负样本不包含相关数据,可用于模型微调与模型评估。<br><br>MPPolak_CritCoolRate_Dataset.xlsx:本数据集为本文基于所提出方法构建的金属玻璃临界冷却率(critical cooling rate)数据集,包含材料名称、临界冷却率数值、单位以及来源文献的数字对象标识符(Digital Object Identifier,DOI)。<br><br>MPPolak_DataExtraction_codes.zip:该压缩包包含复现本文结果所需的简易示例代码。其中提供的"positive"与"negative"文件为训练数据的精简版本,可用于快速运行与测试;"pos"与"neg"文件则包含完整的测试集。"plotting"目录内包含可用于复现论文图表的数据与脚本。
提供机构:
figshare
创建时间:
2023-02-09



