five

Large Language Models in Materials Science: Evaluating RAG Performance in Graphene Synthesis Using RAGAS

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/ry7phxn4js/2
下载链接
链接失效反馈
官方服务:
资源简介:
Retrieval-Augmented Generation (RAG) systems increasingly support scientific research, yet evaluating their performance in specialized domains remains challenging due to the technical complexity and precision requirements of scientific knowledge. This study presents the first systematic analysis of automated evaluation frameworks for scientific RAG systems, focusing on the RAGAS framework applied to RAG-augmented large language models in materials science, with graphene synthesis as a representative case study. We develop a comprehensive evaluation protocol comparing four assessment approaches: RAGAS (an automated RAG evaluation framework), BERTScore, LLM-as-a-Judge, and expert human evaluation across 20 domain-specific questions. Our analysis reveals that while automated metrics can capture relative performance improvements from retrieval augmentation, they exhibit fundamental limitations in absolute score interpretation for scientific content. RAGAS successfully identified performance gains in RAG-augmented systems (0.52-point improvement for Gemini, 1.03-point for Qwen on a 10-point scale), demonstrating particular sensitivity as well as retrieval benefits for smaller, open-source models. These findings establish methodological guidelines for scientific RAG evaluation and highlight critical considerations for researchers deploying AI systems in specialized domains
提供机构:
Nanyang Technological University
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作