crv

Name: crv
Creator: maas
Published: 2025-12-05 16:57:39
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/facebook/crv

下载链接

链接失效反馈

官方服务：

资源简介：

This repository contains the datasets used in the paper **"Verifying Chain-of-Thought Reasoning via its Computational Graph"** (CRV). The data consists of reasoning and math problems, the Chain-of-Thought (CoT) traces generated by **Llama 3.1 8B Instruct**, and step-level correctness annotations used to train and evaluate the CRV verifier. **Paper:** [arXiv:2510.09312](https://arxiv.org/abs/2510.09312) **Codebase:** [GitHub Repository](https://github.com/facebookresearch/CRV) ## Dataset Overview We provide data for three domains. For the arithmetic and boolean domain, the data is organized into files based on complexity (number of operators). ### Domains 1. **Synthetic Arithmetic:** Nested integer expressions involving addition, multiplication, and unary minus operators. 2. **Synthetic Boolean:** Nested logical expressions over truth values 'True' and 'False' involving boolean operators (`and`, `or`, `not`). 3. **GSM8K:** The test split of the GSM8K benchmark, with generated CoT traces. ### Data Types For each domain, we provide two types of files: 1. **Raw CoT Generations:** Contains the original expression/question and the CoT response generated by Llama 3.1 8B Instruct. The CoTs are pre-segmented into individual reasoning steps. 2. **Annotated Data:** Contains the same CoTs but with **step-level correctness labels** (Correct/Incorrect). These labels were generated using the consensus pipeline described in the paper (LLM-as-a-Judge + Programmatic Verification). **Note on Dataset Size:** You may notice that the annotated datasets (e.g., arith.nt7.annotated.json) for synthetic tasks contain fewer samples than the raw datasets (e.g., arith.nd10000.nt7.json). This is intentional. To ensure high-quality, reliable labels, we employed a dual-verification strategy using both an LLM-as-a-Judge and a Programmatic Verifier. We applied a strict intersection policy: only reasoning steps where both methods agreed on the correctness label were retained. Refer to **Appendix A.2** of the paper for details. --- ## File Structure and Naming Convention The files are located in their respective folders (`arithmetic_expressions/`, `boolean_expressions/`, `gsm8k_expressions/`). The filenames follow a specific convention indicating the complexity and size of the dataset. **Format:** `[domain].nd[count].nt[operators].[type].json` * **`nd` (Number of Dialogs):** The number of expressions/samples in the file (e.g., `nd10000` = 10,000 samples). * **`nt` (Non-Terminal Nodes):** The complexity of the expression, defined by the number of operators (e.g., `nt7` = 7 operators). * **`.annotated`:** If present, this file contains step-level labels. ### Examples * **`arith.nd10000.nt7.json`**: * **Domain:** Arithmetic * **Size:** 10,000 samples * **Complexity:** 7 operators * **Content:** Questions and segmented CoTs (Unlabeled). * **`arith.nt7.annotated.json`**: * **Domain:** Arithmetic * **Complexity:** 7 operators * **Content:** Questions, segmented CoTs, and **step-level correctness labels**. --- ## Data Fields The dataset schema differs slightly between the **Raw CoT Generations** (unlabeled) and the **Annotated Data** (labeled). **1. Raw CoT Generations** *(e.g., `arith.nd10000.nt7.json`)* These files contain the generated CoT traces segmented into steps, but without correctness labels. - `role` (string): The role of the message sender (e.g., "assistant"). - `content` (string): The full, unsegmented Chain-of-Thought text generated by the model. - `predicted_truth_value`/`predicted_value` (boolean/number): The final answer derived from the generated CoT. - `step_level` (list): A list of objects representing the segmented steps. - step_number (int): The index of the reasoning step (0-indexed). - content (string): The specific text content of that reasoning step. **2. Annotated Data** *(e.g., `arith.nt7.annotated.json`)* These files contain the full context required to replicate the attribution process, including formatted prompts with special tokens and step-level correctness labels. - `expression_id` (int): A unique identifier for the problem/expression. - `original_expression` (string): The input problem text (e.g., the math problem or boolean/arithmetic expression). - `correct_value` (boolean/number): The ground truth answer. - `predicted_value` (boolean/number): The answer predicted by the model in this specific trace. - `total_steps` (int): The total number of reasoning steps in the solution. - `step_expressions` (list): A list of objects containing detailed annotations for each step: - `step_number` (int): The index of the step. - `step_content` (string): The text content of the current step. - `step_label` (boolean): The correctness label (`true` for Correct, `false` for Incorrect). - `assistant_content_before` (string): The cumulative CoT text **preceding** the current step. - `assistant_content_after` (string): The cumulative CoT text **including** the current step. - `formatted_assistant_content_before` (string): The exact prompt string used to generate the current step, including Llama 3 special tokens (e.g., <|start_header_id|>, <|eot_id|>). This is critical for exact state reconstruction during attribution. - `formatted_assistant_content_after` (string): The exact prompt string representing the state after the step is generated. --- ## Citation If you use this dataset, please cite our paper: ```bibtex @article{zhao2025verifying, title={Verifying Chain-of-Thought Reasoning via Its Computational Graph}, author={Zheng Zhao and Yeskendir Koishekenov and Xianjun Yang and Naila Murray and Nicola Cancedda}, year={2025}, eprint={2510.09312}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.09312}, } ```

本仓库包含论文**《通过计算图验证思维链推理》（Verifying Chain-of-Thought Reasoning via its Computational Graph，简称CRV）**所使用的数据集。该数据集涵盖推理与数学问题、由**Llama 3.1 8B Instruct**生成的思维链（Chain-of-Thought，CoT）轨迹，以及用于训练和评估CRV验证器的步骤级正确性标注。 **论文：** [arXiv:2510.09312](https://arxiv.org/abs/2510.09312) **代码库：** [GitHub 仓库](https://github.com/facebookresearch/CRV) ## 数据集概览我们提供三个领域的数据集。对于算术与布尔领域，数据集按照复杂度（运算符数量）进行组织。 ### 领域 1. **合成算术（Synthetic Arithmetic）：** 包含加法、乘法与一元负运算符的嵌套整数表达式。 2. **合成布尔（Synthetic Boolean）：** 基于真值“真”与“假”的嵌套逻辑表达式，涵盖布尔运算符（`and`、`or`、`not`）。 3. **GSM8K：** GSM8K基准测试的测试分割集，附带生成的CoT轨迹。 ### 数据类型针对每个领域，我们提供两类文件： 1. **原始CoT生成结果：** 包含原始表达式/问题与Llama 3.1 8B Instruct生成的CoT响应。所有CoT已预先分割为独立的推理步骤。 2. **带标注数据：** 包含相同的CoT轨迹，但附带**步骤级正确性标签**（正确/错误）。这些标签通过论文中描述的共识流程生成（即大语言模型（Large Language Model，LLM）作为评判器+程序化验证）。 **数据集规模说明：** 您可能会注意到，合成任务的带标注数据集（例如`arith.nt7.annotated.json`）的样本数量少于原始数据集（例如`arith.nd10000.nt7.json`），这是有意为之。为确保获得高质量、可靠的标签，我们采用了双验证策略：同时使用大语言模型作为评判器与程序化验证器。我们应用了严格的交集规则：仅保留两种方法对正确性标签达成一致的推理步骤。详细信息请参阅论文的附录A.2。 --- ## 文件结构与命名规范数据集文件存储在各自的文件夹中（`arithmetic_expressions/`、`boolean_expressions/`、`gsm8k_expressions/`）。文件名遵循特定的约定，用于标识数据集的复杂度与规模。 **命名格式：** `[domain].nd[count].nt[operators].[type].json` - **`nd`（对话数量，Number of Dialogs）：** 文件中的表达式/样本数量（例如`nd10000`代表10,000个样本）。 - **`nt`（非终结节点数，Non-Terminal Nodes）：** 表达式的复杂度，由运算符数量定义（例如`nt7`代表7个运算符）。 - **`.annotated`：** 若文件名包含该字段，则该文件附带步骤级标签。 ### 示例 - **`arith.nd10000.nt7.json`：** - **领域：** 算术 - **规模：** 10,000个样本 - **复杂度：** 7个运算符 - **内容：** 问题与分割后的CoT轨迹（无标注）。 - **`arith.nt7.annotated.json`：** - **领域：** 算术 - **复杂度：** 7个运算符 - **内容：** 问题、分割后的CoT轨迹与**步骤级正确性标签**。 --- ## 数据字段数据集的模式在**原始CoT生成结果（无标注）**与**带标注数据（有标注）**之间略有差异。 **1. 原始CoT生成结果** （例如`arith.nd10000.nt7.json`）此类文件包含分割为步骤的生成CoT轨迹，但未附带正确性标签。 - `role`（字符串）：消息发送者的角色（例如“assistant”，即助手）。 - `content`（字符串）：模型生成的完整、未分割的思维链文本。 - `predicted_truth_value`/`predicted_value`（布尔值/数值）：由生成的CoT推导得到的最终答案。 - `step_level`（列表）：代表分割后步骤的对象列表。 - `step_number`（整数）：推理步骤的索引（从0开始）。 - `content`（字符串）：该推理步骤的具体文本内容。 **2. 带标注数据** （例如`arith.nt7.annotated.json`）此类文件包含复现归因流程所需的完整上下文，包括带有特殊标记的格式化提示词与步骤级正确性标签。 - `expression_id`（整数）：问题/表达式的唯一标识符。 - `original_expression`（字符串）：输入问题文本（例如数学问题或布尔/算术表达式）。 - `correct_value`（布尔值/数值）：标准答案。 - `predicted_value`（布尔值/数值）：该轨迹中模型预测的答案。 - `total_steps`（整数）：解决方案中的总推理步骤数。 - `step_expressions`（列表）：包含每个步骤详细标注的对象列表： - `step_number`（整数）：步骤的索引。 - `step_content`（字符串）：当前步骤的文本内容。 - `step_label`（布尔值）：正确性标签（`true`代表正确，`false`代表错误）。 - `assistant_content_before`（字符串）：当前步骤之前的累积CoT文本。 - `assistant_content_after`（字符串）：包含当前步骤的累积CoT文本。 - `formatted_assistant_content_before`（字符串）：用于生成当前步骤的精确提示字符串，包含Llama 3的特殊标记（例如`<|start_header_id|>`、`<|eot_id|>`），这对于归因过程中的精确状态重构至关重要。 - `formatted_assistant_content_after`（字符串）：代表当前步骤生成后状态的精确提示字符串。 --- ## 引用若您使用该数据集，请引用我们的论文： bibtex @article{zhao2025verifying, title={Verifying Chain-of-Thought Reasoning via Its Computational Graph}, author={Zheng Zhao and Yeskendir Koishekenov and Xianjun Yang and Naila Murray and Nicola Cancedda}, year={2025}, eprint={2510.09312}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.09312}, }

提供机构：

maas

创建时间：

2025-11-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集