five

MatMech: A Multimodal Dataset for Exploring Causal Mechanisms in Materials Science Literature

收藏
Figshare2025-08-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/MatMech_A_Multimodal_Dataset_for_Exploring_Casual_Mechanisms_in_Materials_Science_Literature/29815979
下载链接
链接失效反馈
官方服务:
资源简介:
OverviewThe MatMech Dataset is a large-scale collection of 61,200+ materials science papers, each represented by a structured JSON file and associated figure images.Each paper is represented not just by its text, but by a detailed JSON structure that maps the scientific findings to the Materials Science Tetrahedron (Processing --> Structure --> Properties --> Performance).This dataset provides both the full raw collection (compressed) and sample cases for quick exploration.Data CompositionThe dataset contains:matmech.zip: Full dataset (\~19 GB) containing all papers, organized by DOI folders. Each folder includes:• paper.json: parsed paper metadata and text content• images/: figures extracted from the PDFcase/: A curated set of small example cases for quick testing and inspectiondataset_summary.json: Machine-readable dataset metadataREADME.md: Human-readable dataset description and usage instructionsThe directory structure is shown as follow:matshare_dataset/│├── README.md├── matshare_full.zip├── case/│ ├── doi_001/│ ├── doi_002/│ └── ...└── data_summary.jsonIntended Use CasesThis dataset supports a wide range of materials informatics and scientific text mining tasks, including:Materials mechanism extractionStructure–property reasoning and knowledge graph constructionMultimodal learning from scientific imagesLLM evaluation on scientific domain understandingAutomatic captioning and figure–text alignmentCausal chain prediction and reasoningResearch trend analysis based on materials categoriesFor more information, please refer to the README.md
创建时间:
2025-08-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作