haoranli-ml/genvf-filtered-proof-only-train

Name: haoranli-ml/genvf-filtered-proof-only-train
Creator: haoranli-ml
Published: 2026-04-08 20:50:33
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/haoranli-ml/genvf-filtered-proof-only-train

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: index dtype: int64 - name: row_id dtype: int64 - name: problem dtype: string - name: answer dtype: string - name: source list: string - name: mean_reward dtype: float64 - name: full_response dtype: string - name: full_reasoning dtype: string - name: model dtype: string - name: prefix dtype: string - name: prefix_end_index dtype: int64 - name: num_thoughts dtype: int64 - name: prefix_type dtype: string - name: prefix_type_description dtype: string - name: suffix_num list: int64 - name: suffix_model list: string - name: pending list: bool - name: pending_model list: 'null' - name: suffix_response list: string - name: suffix_summary list: string - name: self_summary list: string - name: suffix_reasoning list: string - name: finish_reason list: string - name: budget_used list: int64 - name: escalation list: int64 - name: usage list: - name: completion_tokens dtype: int64 - name: prompt_tokens dtype: int64 - name: total_tokens dtype: int64 - name: error list: 'null' - name: error_type list: 'null' - name: prefix_model dtype: string - name: gemini_summary_of_future dtype: string - name: gemini_summary_list list: string - name: prefix_steps list: string - name: suffix_variants list: - name: detailed_steps list: string - name: high_level_steps list: string - name: id dtype: int64 - name: dedup_note dtype: string - name: cross_prefix_alignment_scores list: - name: avg_alignment dtype: float64 - name: individual_scores list: - name: compared_row_id dtype: int64 - name: compared_summary_id dtype: int64 - name: direction dtype: string - name: output_text dtype: string - name: problem_index dtype: int64 - name: reasoning dtype: string - name: score dtype: float64 - name: num_comparisons dtype: int64 - name: summary_id dtype: int64 - name: filtered_suffix list: - name: detailed_steps list: string - name: high_level_steps list: string - name: id dtype: int64 - name: rubrics dtype: string - name: prefix_summary_steps dtype: string - name: filtered_suffix_summary_steps list: string - name: input_to_VF dtype: string splits: - name: train num_bytes: 1104805344 num_examples: 3570 - name: test num_bytes: 13076118 num_examples: 43 download_size: 941328531 dataset_size: 1117881462 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* ---

数据集信息（dataset_info）包含特征列表（features）、数据集拆分（splits）与配置信息（configs）三个部分： ## 特征列表 - 字段：索引（index），数据类型：64位整数 - 字段：行ID（row_id），数据类型：64位整数 - 字段：问题（problem），数据类型：字符串 - 字段：答案（answer），数据类型：字符串 - 字段：来源（source），数据类型：字符串列表 - 字段：平均奖励（mean_reward），数据类型：浮点数 - 字段：完整响应（full_response），数据类型：字符串 - 字段：完整推理过程（full_reasoning），数据类型：字符串 - 字段：模型（model），数据类型：字符串 - 字段：前缀（prefix），数据类型：字符串 - 字段：前缀结束索引（prefix_end_index），数据类型：64位整数 - 字段：思考次数（num_thoughts），数据类型：64位整数 - 字段：前缀类型（prefix_type），数据类型：字符串 - 字段：前缀类型描述（prefix_type_description），数据类型：字符串 - 字段：后缀数量（suffix_num），数据类型：整数列表 - 字段：后缀模型（suffix_model），数据类型：字符串列表 - 字段：待处理状态（pending），数据类型：布尔列表 - 字段：待处理模型（pending_model），数据类型：空值列表 - 字段：后缀响应（suffix_response），数据类型：字符串列表 - 字段：后缀摘要（suffix_summary），数据类型：字符串列表 - 字段：自身摘要（self_summary），数据类型：字符串列表 - 字段：后缀推理过程（suffix_reasoning），数据类型：字符串列表 - 字段：结束原因（finish_reason），数据类型：字符串列表 - 字段：使用预算（budget_used），数据类型：整数列表 - 字段：升级次数（escalation），数据类型：整数列表 - 字段：用量（usage），数据类型：子特征列表，包含： - 补全Token（completion_tokens），数据类型：64位整数 - 提示Token（prompt_tokens），数据类型：64位整数 - 总Token（total_tokens），数据类型：64位整数 - 字段：错误（error），数据类型：空值列表 - 字段：错误类型（error_type），数据类型：空值列表 - 字段：前缀模型（prefix_model），数据类型：字符串 - 字段：Gemini未来摘要（gemini_summary_of_future），数据类型：字符串 - 字段：Gemini摘要列表（gemini_summary_list），数据类型：字符串列表 - 字段：前缀步骤（prefix_steps），数据类型：字符串列表 - 字段：后缀变体（suffix_variants），数据类型：子特征列表，包含： - 详细步骤（detailed_steps），数据类型：字符串列表 - 高级步骤（high_level_steps），数据类型：字符串列表 - ID（id），数据类型：64位整数 - 字段：去重备注（dedup_note），数据类型：字符串 - 字段：跨前缀对齐分数（cross_prefix_alignment_scores），数据类型：子特征列表，包含： - 平均对齐度（avg_alignment），数据类型：浮点数 - 单个对齐分数列表（individual_scores），每个子项包含： - 对比行ID（compared_row_id），数据类型：64位整数 - 对比摘要ID（compared_summary_id），数据类型：64位整数 - 方向（direction），数据类型：字符串 - 输出文本（output_text），数据类型：字符串 - 问题索引（problem_index），数据类型：64位整数 - 推理过程（reasoning），数据类型：字符串 - 对齐分数（score），数据类型：浮点数 - 对比次数（num_comparisons），数据类型：64位整数 - 摘要ID（summary_id），数据类型：64位整数 - 字段：过滤后后缀（filtered_suffix），数据类型：子特征列表，结构与后缀变体一致： - 详细步骤（detailed_steps），数据类型：字符串列表 - 高级步骤（high_level_steps），数据类型：字符串列表 - ID（id），数据类型：64位整数 - 字段：评分标准（rubrics），数据类型：字符串 - 字段：前缀摘要步骤（prefix_summary_steps），数据类型：字符串 - 字段：过滤后后缀摘要步骤（filtered_suffix_summary_steps），数据类型：字符串列表 - 字段：VF输入（input_to_VF），数据类型：字符串 ## 数据集拆分本数据集包含两个拆分： - 训练集（train）：字节大小1104805344，样本量3570 - 测试集（test）：字节大小13076118，样本量43 本数据集总下载大小为941328531字节，总数据集大小为1117881462字节。 ## 配置信息配置项名称为default，对应数据文件如下： - 训练集拆分：对应路径data/train-* - 测试集拆分：对应路径data/test-*

提供机构：

haoranli-ml

5,000+

优质数据集

54 个

任务类型

进入经典数据集