3sid13/byte-matched-ruler
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/3sid13/byte-matched-ruler
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: cwe
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 2661621
num_examples: 304
- name: 8k
num_bytes: 4455620
num_examples: 263
- name: 16k
num_bytes: 10274777
num_examples: 317
- name: 32k
num_bytes: 17878780
num_examples: 282
- name: 64k
num_bytes: 23471075
num_examples: 187
download_size: 36966219
dataset_size: 58741873
- config_name: fwe
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 5352282
num_examples: 500
- name: 8k
num_bytes: 10626404
num_examples: 500
- name: 16k
num_bytes: 21184484
num_examples: 500
- name: 32k
num_bytes: 41892378
num_examples: 500
- name: 64k
num_bytes: 83332093
num_examples: 500
download_size: 42291084
dataset_size: 162387641
- config_name: niah_multikey_1
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 3752886
num_examples: 253
- name: 8k
num_bytes: 6429577
num_examples: 210
- name: 16k
num_bytes: 7787797
num_examples: 128
- name: 32k
num_bytes: 14050395
num_examples: 113
- name: 64k
num_bytes: 29464858
num_examples: 118
download_size: 31927901
dataset_size: 61485513
- config_name: niah_multikey_2
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 1593993
num_examples: 137
- name: 8k
num_bytes: 8456224
num_examples: 361
- name: 16k
num_bytes: 4526345
num_examples: 96
- name: 32k
num_bytes: 15686211
num_examples: 166
- name: 64k
num_bytes: 28567563
num_examples: 151
download_size: 19409849
dataset_size: 58830336
- config_name: niah_multikey_3
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 275796
num_examples: 47
- name: 8k
num_bytes: 1132416
num_examples: 96
- name: 16k
num_bytes: 3453192
num_examples: 146
- name: 32k
num_bytes: 12804750
num_examples: 271
- name: 64k
num_bytes: 20922954
num_examples: 221
download_size: 24772659
dataset_size: 38589108
- config_name: niah_multiquery
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 4702782
num_examples: 317
- name: 8k
num_bytes: 8174028
num_examples: 267
- name: 16k
num_bytes: 24145029
num_examples: 397
- name: 32k
num_bytes: 39418944
num_examples: 317
- name: 64k
num_bytes: 91393156
num_examples: 366
download_size: 91611086
dataset_size: 167833939
- config_name: niah_multivalue
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 1740738
num_examples: 117
- name: 8k
num_bytes: 4167869
num_examples: 136
- name: 16k
num_bytes: 7122855
num_examples: 117
- name: 32k
num_bytes: 18529035
num_examples: 149
- name: 64k
num_bytes: 35211424
num_examples: 141
download_size: 36346995
dataset_size: 66771921
- config_name: niah_single_1
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 6342282
num_examples: 442
- name: 8k
num_bytes: 13011276
num_examples: 447
- name: 16k
num_bytes: 28344945
num_examples: 484
- name: 32k
num_bytes: 57025407
num_examples: 485
- name: 64k
num_bytes: 117032790
num_examples: 497
download_size: 11163175
dataset_size: 221756700
- config_name: niah_single_2
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 3706844
num_examples: 249
- name: 8k
num_bytes: 5338102
num_examples: 174
- name: 16k
num_bytes: 15043000
num_examples: 247
- name: 32k
num_bytes: 37064824
num_examples: 298
- name: 64k
num_bytes: 73930876
num_examples: 296
download_size: 74230180
dataset_size: 135083646
- config_name: niah_single_3
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 3825125
num_examples: 258
- name: 8k
num_bytes: 6095289
num_examples: 199
- name: 16k
num_bytes: 13871857
num_examples: 228
- name: 32k
num_bytes: 28226353
num_examples: 227
- name: 64k
num_bytes: 56435177
num_examples: 226
download_size: 58745056
dataset_size: 108453801
- config_name: qa_1
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 6059702
num_examples: 394
- name: 8k
num_bytes: 12717001
num_examples: 396
- name: 16k
num_bytes: 26252362
num_examples: 453
- name: 32k
num_bytes: 48633265
num_examples: 394
- name: 64k
num_bytes: 51764302
num_examples: 204
download_size: 81457657
dataset_size: 145426632
- config_name: qa_2
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 4k
num_bytes: 4118359
num_examples: 297
- name: 8k
num_bytes: 12153422
num_examples: 436
- name: 16k
num_bytes: 3200039
num_examples: 54
- name: 32k
num_bytes: 15970781
num_examples: 135
- name: 64k
num_bytes: 73751657
num_examples: 312
download_size: 67559728
dataset_size: 109194258
- config_name: vt
features:
- name: index
dtype: int64
- name: input
dtype: string
- name: outputs
list: string
- name: answer_prefix
dtype: string
- name: length
dtype: int64
splits:
- name: 8k
num_bytes: 10675756
num_examples: 364
- name: 16k
num_bytes: 29380400
num_examples: 500
- name: 32k
num_bytes: 44645821
num_examples: 379
- name: 64k
num_bytes: 117852380
num_examples: 500
download_size: 10537110
dataset_size: 202554357
configs:
- config_name: cwe
data_files:
- split: 4k
path: cwe/4k-*
- split: 8k
path: cwe/8k-*
- split: 16k
path: cwe/16k-*
- split: 32k
path: cwe/32k-*
- split: 64k
path: cwe/64k-*
- config_name: fwe
data_files:
- split: 4k
path: fwe/4k-*
- split: 8k
path: fwe/8k-*
- split: 16k
path: fwe/16k-*
- split: 32k
path: fwe/32k-*
- split: 64k
path: fwe/64k-*
- config_name: niah_multikey_1
data_files:
- split: 4k
path: niah_multikey_1/4k-*
- split: 8k
path: niah_multikey_1/8k-*
- split: 16k
path: niah_multikey_1/16k-*
- split: 32k
path: niah_multikey_1/32k-*
- split: 64k
path: niah_multikey_1/64k-*
- config_name: niah_multikey_2
data_files:
- split: 4k
path: niah_multikey_2/4k-*
- split: 8k
path: niah_multikey_2/8k-*
- split: 16k
path: niah_multikey_2/16k-*
- split: 32k
path: niah_multikey_2/32k-*
- split: 64k
path: niah_multikey_2/64k-*
- config_name: niah_multikey_3
data_files:
- split: 4k
path: niah_multikey_3/4k-*
- split: 8k
path: niah_multikey_3/8k-*
- split: 16k
path: niah_multikey_3/16k-*
- split: 32k
path: niah_multikey_3/32k-*
- split: 64k
path: niah_multikey_3/64k-*
- config_name: niah_multiquery
data_files:
- split: 4k
path: niah_multiquery/4k-*
- split: 8k
path: niah_multiquery/8k-*
- split: 16k
path: niah_multiquery/16k-*
- split: 32k
path: niah_multiquery/32k-*
- split: 64k
path: niah_multiquery/64k-*
- config_name: niah_multivalue
data_files:
- split: 4k
path: niah_multivalue/4k-*
- split: 8k
path: niah_multivalue/8k-*
- split: 16k
path: niah_multivalue/16k-*
- split: 32k
path: niah_multivalue/32k-*
- split: 64k
path: niah_multivalue/64k-*
- config_name: niah_single_1
data_files:
- split: 4k
path: niah_single_1/4k-*
- split: 8k
path: niah_single_1/8k-*
- split: 16k
path: niah_single_1/16k-*
- split: 32k
path: niah_single_1/32k-*
- split: 64k
path: niah_single_1/64k-*
- config_name: niah_single_2
data_files:
- split: 4k
path: niah_single_2/4k-*
- split: 8k
path: niah_single_2/8k-*
- split: 16k
path: niah_single_2/16k-*
- split: 32k
path: niah_single_2/32k-*
- split: 64k
path: niah_single_2/64k-*
- config_name: niah_single_3
data_files:
- split: 4k
path: niah_single_3/4k-*
- split: 8k
path: niah_single_3/8k-*
- split: 16k
path: niah_single_3/16k-*
- split: 32k
path: niah_single_3/32k-*
- split: 64k
path: niah_single_3/64k-*
- config_name: qa_1
data_files:
- split: 4k
path: qa_1/4k-*
- split: 8k
path: qa_1/8k-*
- split: 16k
path: qa_1/16k-*
- split: 32k
path: qa_1/32k-*
- split: 64k
path: qa_1/64k-*
- config_name: qa_2
data_files:
- split: 4k
path: qa_2/4k-*
- split: 8k
path: qa_2/8k-*
- split: 16k
path: qa_2/16k-*
- split: 32k
path: qa_2/32k-*
- split: 64k
path: qa_2/64k-*
- config_name: vt
data_files:
- split: 8k
path: vt/8k-*
- split: 16k
path: vt/16k-*
- split: 32k
path: vt/32k-*
- split: 64k
path: vt/64k-*
---
提供机构:
3sid13
搜集汇总
数据集介绍

构建方式
byte-matched-ruler数据集专为评估大型语言模型的长文本理解能力而设计,其构建围绕多种复杂推理任务展开。该数据集共包含十二个配置(config),涵盖了字符级单词提取(cwe)、向前单词提取(fwe)、大海捞针单键与多键检索(niah_single、niah_multikey)、多查询与多值检索(niah_multiquery、niah_multivalue),以及问答(qa)和变长文本(vt)等任务。每个样本由索引、输入文本、输出答案列表及答案前缀组成,并依据序列长度划分为4k至64k五个子集(splits),从而系统性地测试模型在不同上下文长度下的表现。
特点
该数据集的核心特点在于其分等级的难度设计和多样化的任务类型。每个配置下的样本数量根据任务复杂度和长度要求动态调整,例如fwe配置在各级长度下均保持500个样本,确保了评估的统计稳定性;而niah系列则通过调整键值数量引入递增的检索难度。输入文本长度严格按4k、8k、16k、32k、64k分级,覆盖从短到超长的上下文范围,为研究模型的长程依赖捕捉能力提供了丰富素材。此外,每个样本都包含明确的答案前缀字段,便于精准定位模型输出的起始位置。
使用方法
使用该数据集时,研究者可通过HuggingFace Datasets库加载任意配置和长度分片。加载代码示例为:from datasets import load_dataset; dataset = load_dataset('byte-matched-ruler', 'cwe', split='4k')。每个样本的input字段包含完整的提示文本,outputs字段为字符串列表形式的标准答案,answer_prefix则指示模型应从何处开始生成回复。评估时,可将模型输出与outputs中的正确序列进行匹配,从而在不同长度层级上量化模型的长文本理解与信息检索能力。
背景与挑战
背景概述
长文本理解与推理能力是评估大型语言模型(LLM)核心性能的关键维度,然而,现有基准测试往往局限于较短文本或单一任务形式,难以全面反映模型在复杂长程依赖场景下的真实表现。在此背景下,byte-matched-ruler数据集于2024年应运而生,由相关研究团队构建并发布在HuggingFace平台上,旨在系统性地评测LLM在多种长文本任务上的能力。该数据集涵盖字符级词提取(CWE)、词级词提取(FWE)、多键“大海捞针”(NIAH)、问答(QA)及词汇溯源(VT)等十余种精细配置,通过控制输入长度从4k到64k tokens,为研究长文本建模中的信息检索、多跳推理与上下文记忆等核心问题提供了标准化评估工具,对推动长文本LLM的发展具有重要影响。
当前挑战
byte-matched-ruler数据集面临的挑战主要分为两个层面。领域层面,长文本场景下的序列建模面临注意力衰减、信息压缩与上下文位置偏差等棘手的算法难题,模型常难以从数千token的上下文中精准定位关键信息,尤其在多键或多值检索任务(如niah_multikey、niah_multivalue)中,干扰项增多导致推理准确率急剧下降,这构成了当前长文本LLM发展的主要瓶颈。构建层面,数据集设计需在任务多样性、长度梯度和样本数量之间寻求平衡,既要保证每个配置有足够统计效力的样本量,又要控制总数据规模以适应存储与加载,例如niah_multikey_3在4k长度下仅有47个样本,这类稀疏分布可能影响评估的稳健性,同时在生成高质量、无歧义的长文本标注数据时,需避免人工构造的痕迹以确保评测的公正性。
常用场景
经典使用场景
在长文本语言模型的研究浪潮中,byte-matched-ruler数据集为评估模型在极长上下文下的信息检索与推理能力提供了精密的度量标尺。该数据集涵盖多种复杂任务配置,包括单关键字与多关键字的“大海捞针”(Needle in a Haystack, NIAH)测试、字符级与单词级错误检测(CWE/FWE)、多轮问答(QA_1/QA_2)以及可变文本处理(VT)。其经典使用场景聚焦于检验大语言模型在4K至64K不同长度区间上的长程依赖捕捉精度,尤其通过设计精巧的“注入式”任务,迫使模型在冗长语境中精准定位并利用关键信息,进而探明模型有效处理上下文长度的真实边界,彻底超越传统困惑度等间接指标的局限。
解决学术问题
该数据集直面长期困扰学术界的核心矛盾:主流语言模型宣称的上下文窗口长度与真实有效利用率之间的显著鸿沟。通过系统化构造不同难度与长度的检索与推理任务,byte-matched-ruler有效解构并量化了模型在处理超长文本时面临的“迷失于中间”(Lost in the Middle)现象——即模型对位于输入序列中部信息的遗忘或混淆。其研究成果深刻揭示了位置编码机制、注意力稀疏性及段式处理策略对长文本性能的制约,为改进Transformer架构(如位置编码重排、注意力机制优化)提供了关键实证依据,有力推动了可扩展长文本模型的理论与算法创新。
衍生相关工作
byte-matched-ruler数据集的诞生催生了一系列深化长文本研究的重要工作。研究者基于其精细化评测框架,开发出诸如LongBench、L-Eval等综合性多任务长文本基准,进一步覆盖了对话、摘要与代码生成等更广泛的场景。同时,该数据集揭示的模型短板直接激励了对位置编码方法的系统性改进,如ALiBi、RoPE的变体以及基于压缩记忆的Transformer派生模型(如Longformer、BigBird)的实验验证。此外,亦有工作借鉴其“多层次难度退火”的评测设计思路,构建了针对长文本推理的对抗性测试集,从根本上提升了长距离语义建模的评估科学性与鲁棒性。
以上内容由遇见数据集搜集并总结生成



