starbix/oneruler
收藏Hugging Face2026-03-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/starbix/oneruler
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: 'cs_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'cs_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'cs_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'da_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'da_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'da_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'de_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'de_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'de_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'en_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'en_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'en_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'es_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'es_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'es_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fa_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fa_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fa_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fi_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fi_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fi_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fr_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fr_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'fr_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'hi_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'hi_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'hi_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'hu_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'hu_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'hu_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'it_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'it_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'it_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ja_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ja_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ja_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ko_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ko_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ko_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'nl_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'nl_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'nl_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'no_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'no_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'no_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'pl_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'pl_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'pl_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'pt_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'pt_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'pt_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ru_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ru_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ru_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sr_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sr_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sr_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'st_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'st_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'st_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sv_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sv_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sv_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sw_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sw_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'sw_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ta_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ta_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'ta_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'uk_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'uk_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'uk_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'vi_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'vi_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'vi_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'zh_4096'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'zh_8192'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
- config_name: 'zh_16384'
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer_prefix
dtype: string
- name: answer
sequence: string
- name: task
dtype: string
- name: language
dtype: string
- name: max_new_tokens
dtype: int64
splits:
- name: test
num_examples: 3500
configs:
- config_name: 'cs_4096'
data_files:
- split: test
path: cs_4096/test-*
- config_name: 'cs_8192'
data_files:
- split: test
path: cs_8192/test-*
- config_name: 'cs_16384'
data_files:
- split: test
path: cs_16384/test-*
- config_name: 'da_4096'
data_files:
- split: test
path: da_4096/test-*
- config_name: 'da_8192'
data_files:
- split: test
path: da_8192/test-*
- config_name: 'da_16384'
data_files:
- split: test
path: da_16384/test-*
- config_name: 'de_4096'
data_files:
- split: test
path: de_4096/test-*
- config_name: 'de_8192'
data_files:
- split: test
path: de_8192/test-*
- config_name: 'de_16384'
data_files:
- split: test
path: de_16384/test-*
- config_name: 'en_4096'
data_files:
- split: test
path: en_4096/test-*
- config_name: 'en_8192'
data_files:
- split: test
path: en_8192/test-*
- config_name: 'en_16384'
data_files:
- split: test
path: en_16384/test-*
- config_name: 'es_4096'
data_files:
- split: test
path: es_4096/test-*
- config_name: 'es_8192'
data_files:
- split: test
path: es_8192/test-*
- config_name: 'es_16384'
data_files:
- split: test
path: es_16384/test-*
- config_name: 'fa_4096'
data_files:
- split: test
path: fa_4096/test-*
- config_name: 'fa_8192'
data_files:
- split: test
path: fa_8192/test-*
- config_name: 'fa_16384'
data_files:
- split: test
path: fa_16384/test-*
- config_name: 'fi_4096'
data_files:
- split: test
path: fi_4096/test-*
- config_name: 'fi_8192'
data_files:
- split: test
path: fi_8192/test-*
- config_name: 'fi_16384'
data_files:
- split: test
path: fi_16384/test-*
- config_name: 'fr_4096'
data_files:
- split: test
path: fr_4096/test-*
- config_name: 'fr_8192'
data_files:
- split: test
path: fr_8192/test-*
- config_name: 'fr_16384'
data_files:
- split: test
path: fr_16384/test-*
- config_name: 'hi_4096'
data_files:
- split: test
path: hi_4096/test-*
- config_name: 'hi_8192'
data_files:
- split: test
path: hi_8192/test-*
- config_name: 'hi_16384'
data_files:
- split: test
path: hi_16384/test-*
- config_name: 'hu_4096'
data_files:
- split: test
path: hu_4096/test-*
- config_name: 'hu_8192'
data_files:
- split: test
path: hu_8192/test-*
- config_name: 'hu_16384'
data_files:
- split: test
path: hu_16384/test-*
- config_name: 'it_4096'
data_files:
- split: test
path: it_4096/test-*
- config_name: 'it_8192'
data_files:
- split: test
path: it_8192/test-*
- config_name: 'it_16384'
data_files:
- split: test
path: it_16384/test-*
- config_name: 'ja_4096'
data_files:
- split: test
path: ja_4096/test-*
- config_name: 'ja_8192'
data_files:
- split: test
path: ja_8192/test-*
- config_name: 'ja_16384'
data_files:
- split: test
path: ja_16384/test-*
- config_name: 'ko_4096'
data_files:
- split: test
path: ko_4096/test-*
- config_name: 'ko_8192'
data_files:
- split: test
path: ko_8192/test-*
- config_name: 'ko_16384'
data_files:
- split: test
path: ko_16384/test-*
- config_name: 'nl_4096'
data_files:
- split: test
path: nl_4096/test-*
- config_name: 'nl_8192'
data_files:
- split: test
path: nl_8192/test-*
- config_name: 'nl_16384'
data_files:
- split: test
path: nl_16384/test-*
- config_name: 'no_4096'
data_files:
- split: test
path: no_4096/test-*
- config_name: 'no_8192'
data_files:
- split: test
path: no_8192/test-*
- config_name: 'no_16384'
data_files:
- split: test
path: no_16384/test-*
- config_name: 'pl_4096'
data_files:
- split: test
path: pl_4096/test-*
- config_name: 'pl_8192'
data_files:
- split: test
path: pl_8192/test-*
- config_name: 'pl_16384'
data_files:
- split: test
path: pl_16384/test-*
- config_name: 'pt_4096'
data_files:
- split: test
path: pt_4096/test-*
- config_name: 'pt_8192'
data_files:
- split: test
path: pt_8192/test-*
- config_name: 'pt_16384'
data_files:
- split: test
path: pt_16384/test-*
- config_name: 'ru_4096'
data_files:
- split: test
path: ru_4096/test-*
- config_name: 'ru_8192'
data_files:
- split: test
path: ru_8192/test-*
- config_name: 'ru_16384'
data_files:
- split: test
path: ru_16384/test-*
- config_name: 'sr_4096'
data_files:
- split: test
path: sr_4096/test-*
- config_name: 'sr_8192'
data_files:
- split: test
path: sr_8192/test-*
- config_name: 'sr_16384'
data_files:
- split: test
path: sr_16384/test-*
- config_name: 'st_4096'
data_files:
- split: test
path: st_4096/test-*
- config_name: 'st_8192'
data_files:
- split: test
path: st_8192/test-*
- config_name: 'st_16384'
data_files:
- split: test
path: st_16384/test-*
- config_name: 'sv_4096'
data_files:
- split: test
path: sv_4096/test-*
- config_name: 'sv_8192'
data_files:
- split: test
path: sv_8192/test-*
- config_name: 'sv_16384'
data_files:
- split: test
path: sv_16384/test-*
- config_name: 'sw_4096'
data_files:
- split: test
path: sw_4096/test-*
- config_name: 'sw_8192'
data_files:
- split: test
path: sw_8192/test-*
- config_name: 'sw_16384'
data_files:
- split: test
path: sw_16384/test-*
- config_name: 'ta_4096'
data_files:
- split: test
path: ta_4096/test-*
- config_name: 'ta_8192'
data_files:
- split: test
path: ta_8192/test-*
- config_name: 'ta_16384'
data_files:
- split: test
path: ta_16384/test-*
- config_name: 'uk_4096'
data_files:
- split: test
path: uk_4096/test-*
- config_name: 'uk_8192'
data_files:
- split: test
path: uk_8192/test-*
- config_name: 'uk_16384'
data_files:
- split: test
path: uk_16384/test-*
- config_name: 'vi_4096'
data_files:
- split: test
path: vi_4096/test-*
- config_name: 'vi_8192'
data_files:
- split: test
path: vi_8192/test-*
- config_name: 'vi_16384'
data_files:
- split: test
path: vi_16384/test-*
- config_name: 'zh_4096'
data_files:
- split: test
path: zh_4096/test-*
- config_name: 'zh_8192'
data_files:
- split: test
path: zh_8192/test-*
- config_name: 'zh_16384'
data_files:
- split: test
path: zh_16384/test-*
---
# starbix/oneruler
A preprocessed, HuggingFace-ready version of the [OneRuler](https://arxiv.org/abs/2503.01996) multilingual long-context benchmark, built on top of [RULER](https://arxiv.org/abs/2404.06654).
Each config is named `{language}_{context_length}` (e.g. `en_4096`, `ko_16384`) and contains a single `test` split.
## Tasks
| Task | Category | Description |
|------|----------|-------------|
| `niah_single` | Retrieval | Single needle, single key, single query |
| `niah_none` | Retrieval | Query key absent — answer is the language-specific word for "none" |
| `niah_multikey` | Retrieval | 4 needles inserted, 1 queried |
| `niah_multivalue` | Retrieval | 1 key mapped to 4 values, all must be retrieved |
| `niah_multiquery` | Retrieval | 1 needle, 2 simultaneous queries |
| `cwe` | Aggregation | Common words extraction (freq ratio 20:10) |
| `cwe_easy` | Aggregation | Common words extraction (freq ratio 30:3) |
## Languages
26 languages: `cs` `da` `de` `en` `es` `fa` `fi` `fr` `hi` `hu` `it` `ja` `ko` `nl` `no` `pl` `pt` `ru` `sr` `st` `sv` `sw` `ta` `uk` `vi` `zh`
## Context lengths
`4096` · `8192` · `16384` tokens (tokenized with `meta-llama/Meta-Llama-3.1-8B`)
## Schema
| Column | Type | Description |
|--------|------|-------------|
| `context` | string | The long context (book excerpt + inserted needles for NIAH, word list for CWE) |
| `question` | string | The question to answer, in the task language |
| `answer_prefix` | string | The answer format prefix shown to the model (NIAH only; empty for CWE) |
| `answer` | list[string] | Ground-truth answer(s) |
| `task` | string | Task name, e.g. `niah_single`, `cwe_easy` |
| `language` | string | ISO 639-1 language code, e.g. `en`, `ko`, `zh` |
| `max_new_tokens` | int | Generation budget (30 for NIAH, 50 for CWE) |
## Usage
```python
from datasets import load_dataset
# Load English, 4096-token context
ds = load_dataset("starbix/oneruler", name="en_4096", split="test")
# Load Korean, 16384-token context
ds = load_dataset("starbix/oneruler", name="ko_16384", split="test")
```
## Citation
```bibtex
@misc{kim2025rulermeasureallbenchmarking,
title={One ruler to measure them all: Benchmarking multilingual long-context language models},
author={Yekyung Kim and Jenna Russell and Marzena Karpinska and Mohit Iyyer},
year={2025},
eprint={2503.01996},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.01996},
}
```
提供机构:
starbix



