five

BiGGen-Bench

收藏
arXiv2024-06-09 更新2024-06-12 收录
下载链接:
https://huggingface.co/datasets/prometheus-eval/BiGGen-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
BiGGen-Bench是由韩国科学技术院和LG AI Research等机构合作创建的一个综合性语言模型评估数据集,旨在通过77个多样化的任务评估语言模型的九大核心能力,包括指令遵循、基础、规划、推理、精炼、安全、心智理论、工具使用和多语言能力。该数据集包含765个实例,每个实例都有其特定的细粒度评估标准,以确保评估的精确性和全面性。创建过程中采用了人机交互的方法,确保数据集的质量和适用性。BiGGen-Bench的应用领域广泛,主要用于语言模型的性能评估和改进,特别是在需要高度精确和细致评估的场景中。

BiGGen-Bench is a comprehensive language model evaluation dataset co-created by institutions including the Korea Advanced Institute of Science and Technology (KAIST) and LG AI Research. It aims to evaluate nine core capabilities of language models through 77 diverse tasks, including instruction following, foundational capabilities, planning, reasoning, refinement, safety, theory of mind, tool use, and multilingual capabilities. The dataset consists of 765 instances, each with specific fine-grained evaluation criteria to ensure the accuracy and comprehensiveness of the evaluation. A human-machine interactive approach was adopted during the creation process to guarantee the dataset's quality and applicability. BiGGen-Bench has broad application scenarios, and is primarily used for performance evaluation and improvement of language models, especially in scenarios requiring highly precise and meticulous evaluations.
提供机构:
韩国科学技术院
创建时间:
2024-06-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作