five

Datasets To EVAL.

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Datasets_To_EVAL_/30754500
下载链接
链接失效反馈
官方服务:
资源简介:
This paper presents a novel approach to enhancing educational question-answering (Q&A) systems by combining Retrieval-Augmented Generation (RAG) with Large Language Model (LLM) Code Interpreters. Traditional educational Q&A systems face challenges in areas such as knowledge updates, reasoning accuracy, and the handling of complex computational tasks. These limitations are particularly evident in domains requiring multi-step reasoning or access to real-time, domain-specific knowledge. To address these issues, we propose a system that utilizes RAG to dynamically retrieve up-to-date, relevant information from external knowledge sources, thus mitigating the common “hallucination” problem in LLMs. Additionally, the integration of an LLM Code Interpreter enables the system to perform multi-step logical reasoning and execute Python code for precise calculations, significantly improving its ability to solve mathematical problems and handle complex queries. We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. Among tested models, GPT-4o and Gemini-pro-1.5 consistently showed the strongest performance, excelling particularly in scientific reasoning, multi-step computations. Despite these advancements, we identify several challenges that remain, including knowledge retrieval failures, code execution errors, difficulties in synthesizing cross-disciplinary information, and limitations in multi-modal reasoning, particularly when combining text and images. These challenges provide important directions for future research aimed at further optimizing educational Q&A systems. Our work shows that integrating RAG and Code Interpreters offers a promising path toward more accurate, transparent, and personalized educational Q&A systems, and can significantly improve the learning experience in various educational contexts.
创建时间:
2025-12-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作