livebench_instruction_following
收藏魔搭社区2025-12-04 更新2025-03-29 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/livebench_instruction_following
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "livebench/instruction_following"
LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:
- LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses.
- Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored accurately and automatically, without the use of an LLM judge.
- LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time.
This is the instruction_following category of livebench.
See more in our [paper](https://arxiv.org/abs/2406.19314), [leaderboard](https://livebench.ai/), and [datasheet](https://github.com/LiveBench/LiveBench/blob/main/docs/DATASHEET.md).
# 「livebench/instruction_following」数据集卡片
LiveBench是一款面向大语言模型(LLM)的基准测试集,其研发充分考量了测试集污染防控与客观评估两大核心需求。该基准测试集具备以下特性:
- LiveBench通过每月发布全新测试问题,且测试问题均取材于最新公开数据集、arXiv论文、新闻文章以及IMDb电影剧情梗概,以此最大程度降低潜在的测试集污染风险。
- 每个测试问题均配有可验证的客观标准答案,即便难度较高的问题也可无需借助大语言模型评判器,实现精准且自动化的评分。
目前LiveBench共涵盖6大类共18项多样化测试任务,后续团队还将逐步发布难度更高的全新测试任务。
本数据集为LiveBench的指令遵循(instruction_following)子类。
更多详情可参阅我们的[论文](https://arxiv.org/abs/2406.19314)、[排行榜](https://livebench.ai/)以及[数据集说明文档](https://github.com/LiveBench/LiveBench/blob/main/docs/DATASHEET.md)。
提供机构:
maas
创建时间:
2025-03-28



