livebench/model_answer

Name: livebench/model_answer
Creator: livebench
Published: 2024-10-22 03:09:20
License: 暂无描述

Hugging Face2024-10-22 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/livebench/model_answer

下载链接

链接失效反馈

官方服务：

资源简介：

LiveBench是一个为大型语言模型（LLMs）设计的基准测试，旨在防止测试集污染并实现客观评估。它具有以下特点：- LiveBench通过每月发布新问题以及基于最新发布的数据集、arXiv论文、新闻文章和IMDb电影摘要的问题来限制潜在的污染。- 每个问题都有可验证的客观真实答案，允许准确且自动地评分难题，而无需使用LLM评判。- LiveBench目前包含6个类别中的18个多样化任务，并将随时间发布新的、更难的任务。该数据集包含当前用于创建排行榜的所有模型答案。

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties: - LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. - Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored accurately and automatically, without the use of an LLM judge. - LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time. This dataset contains all model answers currently used to create the leaderboard.

提供机构：

livebench

原始信息汇总

数据集概述

数据集信息

特征:
- question_id: 问题ID，类型为字符串。
- answer_id: 答案ID，类型为字符串。
- model_id: 模型ID，类型为字符串。
- choices: 选项列表，包含以下子特征：
  - index: 索引，类型为整数。
  - turns: 序列，类型为字符串。
- tstamp: 时间戳，类型为浮点数。
- category: 类别，类型为字符串。
- task: 任务，类型为字符串。
拆分:
- leaderboard: 排行榜数据，包含51870个样本，总大小为86726046字节。
下载大小: 31126528字节
数据集大小: 86726046字节

配置

默认配置:
- data_files:
  - split: leaderboard
  - path: data/leaderboard-*

livebench/model_answer

数据集概述

数据集信息

配置

相关文献