Large Language Models in Materials Science: Evaluating RAG Performance in Graphene Synthesis Using RAGAS

Name: Large Language Models in Materials Science: Evaluating RAG Performance in Graphene Synthesis Using RAGAS
Creator: Nanyang Technological University
License: 暂无描述

Mendeley Data2026-04-09 收录

下载链接：

https://data.mendeley.com/datasets/ry7phxn4js/2

下载链接

链接失效反馈

官方服务：

资源简介：

Retrieval-Augmented Generation (RAG) systems increasingly support scientific research, yet evaluating their performance in specialized domains remains challenging due to the technical complexity and precision requirements of scientific knowledge. This study presents the first systematic analysis of automated evaluation frameworks for scientific RAG systems, focusing on the RAGAS framework applied to RAG-augmented large language models in materials science, with graphene synthesis as a representative case study. We develop a comprehensive evaluation protocol comparing four assessment approaches: RAGAS (an automated RAG evaluation framework), BERTScore, LLM-as-a-Judge, and expert human evaluation across 20 domain-specific questions. Our analysis reveals that while automated metrics can capture relative performance improvements from retrieval augmentation, they exhibit fundamental limitations in absolute score interpretation for scientific content. RAGAS successfully identified performance gains in RAG-augmented systems (0.52-point improvement for Gemini, 1.03-point for Qwen on a 10-point scale), demonstrating particular sensitivity as well as retrieval benefits for smaller, open-source models. These findings establish methodological guidelines for scientific RAG evaluation and highlight critical considerations for researchers deploying AI systems in specialized domains

提供机构：

Nanyang Technological University

5,000+

优质数据集

54 个

任务类型

进入经典数据集