five

patrickshitou/ArcMMLU

收藏
Hugging Face2023-12-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/patrickshitou/ArcMMLU
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 --- ## Introduction [ArcMMLU](https://github.com/stzhang-patrick/ArcMMLU) is a Chinese benchmark specifically designed for evaluating LLMs on Library & Information Science (LIS). It aims to evaluate the knowledge and reasoning capabilities of LLMs in the LIS academic field, which covers four key sub-areas: Archival Science, Data Science, Library Science, and Information Science. Please refer to our paper for more information [ArcMMLU: A Library and Information Science Benchmark for Large Language Models](https://arxiv.org/abs/2311.18658) It is important to note that the name ArcMMLU is derived from our previous large language model research project—[ArcGPT](https://arxiv.org/abs/2307.14852), which was primarily focused on Archival Science. Later, our research scope expanded from Archival Science to a broader field of information management, but we retained the name ArcMMLU. Therefore, ArcMMLU is not just an evaluation benchmark for Archival Science; it is a comprehensive evaluation dataset for the entire LIS discipline. For the sake of convenience, ArcMMLU adopts the same data format as CMMLU. Furthermore, based on the CMMLU project, we provide evaluation code. For models that have been evaluated on CMMLU, conducting an evaluation on ArcMMLU will be pretty straightforward. Special thanks to the [CMMLU---Chinese Multi-Task Language Understanding Evaluation](https://github.com/haonan-li/CMMLU) project for its contribution to the evaluation of Chinese LLMs. We hope that ArcMMLU can serve as a powerful supplement in specialized fields, bringing more detail and depth to the evaluation of Chinese LLMs.
提供机构:
patrickshitou
原始信息汇总

数据集介绍

ArcMMLU 是一个专门用于评估大型语言模型(LLMs)在图书馆与信息科学(LIS)领域的中国基准测试。该基准旨在评估LLMs在LIS学术领域的知识和推理能力,涵盖四个关键子领域:档案学、数据科学、图书馆学和信息科学。

值得注意的是,ArcMMLU的名称源自我们之前的大型语言模型研究项目——ArcGPT,该项目主要关注档案学。后来,我们的研究范围从档案学扩展到更广泛的信息管理领域,但我们保留了ArcMMLU的名称。因此,ArcMMLU不仅是一个档案学的评估基准,而是整个LIS学科的综合评估数据集。

为了方便起见,ArcMMLU采用了与CMMLU相同的数据格式。此外,基于CMMLU项目,我们提供了评估代码。对于已经在CMMLU上进行评估的模型,在ArcMMLU上进行评估将非常简单。特别感谢CMMLU---Chinese Multi-Task Language Understanding Evaluation项目对中文LLMs评估的贡献。我们希望ArcMMLU能够在专业领域作为一个强大的补充,为中文LLMs的评估带来更多的细节和深度。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作