five

SkillFactory/BF_EVAL-cd3args-Qwen2.5-1.5B-Instruct-R1-SFT

收藏
Hugging Face2025-12-04 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/SkillFactory/BF_EVAL-cd3args-Qwen2.5-1.5B-Instruct-R1-SFT
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en tags: - evaluation - skill-factory --- These datasets are exactly like the Evaluation datasets except the model_responses array are budget forcing rounds. So the first response is at a maximum total context length of 4k, the second response (2nd index in the array) is a continuation of that last response up to a total of 8,192 tokens. # Column Details | Column | Description | |--------|-------------| | `question` | The question we want the model to answer | | `answer` | The string answer | | `task` | The name of the task the row belongs to | | `prompt` | The prompt we will feed into the model to solve the question | | `model_responses` | An array of strings that the model generated to answer the prompt (usually size of 4 or 34 depending on the evaluation task) | | `model_responses__eval_is_correct` | An array aligned with `model_responses` containing booleans: `True` when the response was correct, `False` when incorrect or no answer was found | | `model_responses__eval_extracted_answers` | An array aligned with `model_responses` containing the extracted answer strings from each response (usually the last answer in `<answer>` tags) | | `model_responses__internal_answers__eval_is_correct` | An array aligned with `model_responses` where each value is an array of booleans for the correctness of intermediate answers within a trace | | `model_responses__internal_answers__eval_extracted_answers` | Similar to `model_responses__eval_extracted_answers` but for internal/intermediate answers | | `all_other_columns` | A catch-all column for additional task-dependent information (e.g., for countdown: target number and arguments) | | `metadata` | Metadata about the question (alternative location for task-specific data like countdown target/arguments) | | `prompt__metadata` | Metadata for the vLLM network request including URL and generation parameters. We used a customized [Curator](https://github.com/bespokelabsai/curator) to send raw text to `/completion` instead of `/chat/completion` for warm-start prompts and budget forcing | | `model_responses__metadata` | Metadata returned from the vLLM request | **Additional task-specific columns:** `answer_index`, `answer_key`, `choices`, `id`, `difficulty`, `domain`, `evaluation_type`, `expected_answer_format`, `original_answer`, `source`, `task_type`, `variant`, `acronym`, `formed_acronym`, `word_count`, `words`, `length`, `letters`
提供机构:
SkillFactory
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作