livebench/instruction_following
收藏Hugging Face2025-04-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/livebench/instruction_following
下载链接
链接失效反馈官方服务:
资源简介:
LiveBench是一个为大型语言模型(LLMs)设计的基准测试,旨在防止测试集污染并实现客观评估。它具有以下特点:- LiveBench通过每月发布新问题以及基于最近发布的数据集、arXiv论文、新闻文章和IMDb电影摘要的问题来限制潜在的污染。- 每个问题都有可验证的、客观的正确答案,允许准确且自动地评分难题,而无需使用LLM评判。- LiveBench目前包含18个不同任务,涵盖6个类别,并将随着时间的推移发布新的、更困难的任务。这是LiveBench的instruction_following类别。
LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties: - LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. - Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored accurately and automatically, without the use of an LLM judge. - LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time. This is the instruction_following category of livebench.
提供机构:
livebench
原始信息汇总
数据集概述
数据集特征
- question_id:字符串类型
- task:字符串类型
- turns:字符串序列类型
- category:字符串类型
- instruction_id_list:字符串序列类型
- kwargs:列表类型,包含以下子特征:
- num_sentences:int64类型
- relation:字符串类型
- capital_frequency:int64类型
- capital_relation:字符串类型
- section_spliter:字符串类型
- num_sections:int64类型
- postscript_marker:字符串类型
- num_words:int64类型
- keywords:字符串序列类型
- num_paragraphs:int64类型
- nth_paragraph:int64类型
- first_word:字符串类型
- end_phrase:字符串类型
- letter:字符串类型
- let_frequency:int64类型
- let_relation:字符串类型
- keyword:字符串类型
- frequency:int64类型
- forbidden_words:字符串序列类型
- num_placeholders:int64类型
- num_bullets:int64类型
- num_highlights:int64类型
- prompt_to_repeat:字符串类型
- task_prompt:字符串类型
数据集分割
- test:
- 字节数:515269
- 示例数:200
数据集大小
- 下载大小:284790字节
- 数据集大小:515269字节
配置
- config_name: default
- data_files:
- split: test
- path: data/test-*
- split: test
- data_files:



