SkillFactory/BF_EVAL-cd3args-Qwen2.5-1.5B-Instruct-R1-SFT

Name: SkillFactory/BF_EVAL-cd3args-Qwen2.5-1.5B-Instruct-R1-SFT
Creator: SkillFactory
Published: 2025-12-04 04:29:23
License: 暂无描述

Hugging Face2025-12-04 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/SkillFactory/BF_EVAL-cd3args-Qwen2.5-1.5B-Instruct-R1-SFT

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation language: - en tags: - evaluation - skill-factory --- These datasets are exactly like the Evaluation datasets except the model_responses array are budget forcing rounds. So the first response is at a maximum total context length of 4k, the second response (2nd index in the array) is a continuation of that last response up to a total of 8,192 tokens. # Column Details | Column | Description | |--------|-------------| | `question` | The question we want the model to answer | | `answer` | The string answer | | `task` | The name of the task the row belongs to | | `prompt` | The prompt we will feed into the model to solve the question | | `model_responses` | An array of strings that the model generated to answer the prompt (usually size of 4 or 34 depending on the evaluation task) | | `model_responses__eval_is_correct` | An array aligned with `model_responses` containing booleans: `True` when the response was correct, `False` when incorrect or no answer was found | | `model_responses__eval_extracted_answers` | An array aligned with `model_responses` containing the extracted answer strings from each response (usually the last answer in `<answer>` tags) | | `model_responses__internal_answers__eval_is_correct` | An array aligned with `model_responses` where each value is an array of booleans for the correctness of intermediate answers within a trace | | `model_responses__internal_answers__eval_extracted_answers` | Similar to `model_responses__eval_extracted_answers` but for internal/intermediate answers | | `all_other_columns` | A catch-all column for additional task-dependent information (e.g., for countdown: target number and arguments) | | `metadata` | Metadata about the question (alternative location for task-specific data like countdown target/arguments) | | `prompt__metadata` | Metadata for the vLLM network request including URL and generation parameters. We used a customized [Curator](https://github.com/bespokelabsai/curator) to send raw text to `/completion` instead of `/chat/completion` for warm-start prompts and budget forcing | | `model_responses__metadata` | Metadata returned from the vLLM request | **Additional task-specific columns:** `answer_index`, `answer_key`, `choices`, `id`, `difficulty`, `domain`, `evaluation_type`, `expected_answer_format`, `original_answer`, `source`, `task_type`, `variant`, `acronym`, `formed_acronym`, `word_count`, `words`, `length`, `letters`

提供机构：

SkillFactory

5,000+

优质数据集

54 个

任务类型

进入经典数据集