livebench/reasoning

Name: livebench/reasoning
Creator: livebench
Published: 2025-04-07 20:34:13
License: 暂无描述

Hugging Face2025-04-07 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/livebench/reasoning

下载链接

链接失效反馈

官方服务：

资源简介：

LiveBench是一个为大型语言模型（LLMs）设计的基准测试，旨在限制测试集污染并进行客观评估。它具有以下特点：每月发布新问题，问题基于最新发布的数据集、arXiv论文、新闻文章和IMDb电影摘要；每个问题都有可验证的客观真实答案，允许准确自动评分，无需使用LLM评判；目前包含18个不同任务，分布在6个类别中，并计划发布更难的挑战。这是LiveBench的instruction_following类别。

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties: it releases new questions monthly, with questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses; each question has verifiable, objective ground-truth answers, allowing for accurate and automatic scoring without the use of an LLM judge; it currently contains a set of 18 diverse tasks across 6 categories, with plans to release new, harder tasks over time. This is the instruction_following category of LiveBench.

提供机构：

livebench

原始信息汇总