TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_4arg-eval_rl
收藏Hugging Face2025-12-02 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_4arg-eval_rl
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
config_name: latest
features:
- name: question
dtype: string
- name: answer
dtype: string
- name: task_config
dtype: string
- name: task_source
dtype: string
- name: prompt
list:
- name: content
dtype: string
- name: role
dtype: string
- name: model_responses
list: 'null'
- name: model_responses__eval_is_correct
list: 'null'
- name: all_other_columns
dtype: string
- name: original_split
dtype: string
- name: metadata
dtype: string
- name: model_responses__best_of_n_atags
list: string
- name: model_responses__best_of_n_atags__finish_reason_length_flags
list: bool
- name: model_responses__best_of_n_atags__length_partial_responses
list: string
- name: prompt__best_of_n_atags__metadata
dtype: string
- name: model_responses__best_of_n_atags__metadata
dtype: string
- name: model_responses__best_of_n_atags__eval_is_correct
list: bool
- name: model_responses__best_of_n_atags__eval_extracted_answers
list: string
- name: model_responses__best_of_n_atags__eval_extraction_metadata
dtype: string
- name: model_responses__best_of_n_atags__eval_evaluation_metadata
dtype: string
- name: model_responses__best_of_n_atags__internal_answers__eval_is_correct
list:
list: bool
- name: model_responses__best_of_n_atags__internal_answers__eval_extracted_answers
list:
list: string
- name: model_responses__best_of_n_atags__internal_answers__eval_extraction_metadata
dtype: string
- name: model_responses__best_of_n_atags__internal_answers__eval_evaluation_metadata
dtype: string
- name: model_responses__best_of_n_atags__metrics
struct:
- name: flips_by
list: int64
- name: flips_total
dtype: int64
- name: num_correct
dtype: int64
- name: pass_at_n
dtype: int64
- name: percent_correct
dtype: float64
- name: total_responses
dtype: int64
- name: eval_date
dtype: string
splits:
- name: test
num_bytes: 35361472
num_examples: 1000
download_size: 5009595
dataset_size: 35361472
configs:
- config_name: latest
data_files:
- split: test
path: latest/test-*
---
提供机构:
TAUR-dev



