SciEx
收藏arXiv2024-06-15 更新2024-06-19 收录
下载链接:
https://huggingface.co/datasets/tuanh23/SciEx
下载链接
链接失效反馈官方服务:
资源简介:
SciEx是由卡尔斯鲁厄理工学院创建的多语言、多模态科学考试基准数据集,包含154个大学计算机科学考试问题,涵盖英语和德语。数据集通过收集不同计算机科学课程的考试题目构建,包括自然语言处理、人工智能等领域的考试。创建过程涉及专家对LLM输出的评估,以及自动评分机制的开发。SciEx旨在评估大型语言模型在解决科学任务上的能力,特别是在自由形式问题上的表现,为未来LLM的评估提供了一个可靠的基准。
SciEx is a multilingual, multimodal scientific examination benchmark dataset developed by Karlsruhe Institute of Technology (KIT). It comprises 154 university-level computer science exam questions in both English and German. The dataset is constructed by collecting exam questions from various computer science courses, including those from fields such as natural language processing (NLP) and artificial intelligence (AI). The development of SciEx involved expert evaluation of outputs from large language models (LLMs) as well as the development of automated scoring mechanisms. This benchmark aims to evaluate the capabilities of large language models (LLMs) in solving scientific tasks, particularly their performance on free-form open-ended questions, and provides a reliable benchmark for future LLM evaluations.
提供机构:
卡尔斯鲁厄理工学院
创建时间:
2024-06-15



