ArnoldMoya/execoconut-dataset
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ArnoldMoya/execoconut-dataset
下载链接
链接失效反馈官方服务:
资源简介:
# execoconut-dataset
ExecCoCoNuT (Execution CoCoNuT): Python code execution traces for continuous latent thought training.
## Dataset Description
This dataset contains 1,000+ Python code snippets with execution traces, designed for training language models to reason about program state in continuous latent space.
## Dataset Statistics
- **Samples**: ~1,000 (train: 900, val: 50, test: 50)
- **Variable scope**: 6 integer variables (a-f)
- **Operations**: +, -, *, // (integer division)
- **Control flow**: Sequential only (no loops/branches)
- **Value range**: [-100, 100]
- **Snippet length**: 3-8 instructions per sample
## Data Format
Each sample is a JSON object with three fields:
```json
{
"question": "a = 3\nb = a + 2\nc = b * a",
"steps": [
"State: {\"a\": 3}",
"State: {\"a\": 3, \"b\": 5}",
"State: {\"a\": 3, \"b\": 5, \"c\": 15}"
],
"answer": "c = 15"
}
```
- **question**: Multi-line Python code snippet
- **steps**: Execution trace showing state after each instruction
- **answer**: Final variable assignment (the target prediction)
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("ArnoldMoya/execoconut-dataset")
# Access splits
train = dataset['train']
val = dataset['validation']
test = dataset['test']
# Example
for sample in train.take(1):
print(sample['question'])
print(sample['steps'])
print(sample['answer'])
```
## Splits
- `train.jsonl`: Training split (90%)
- `validation.jsonl`: Validation split (5%)
- `test.jsonl`: Test split (5%)
## Paper
Introduced in "ExecCoCoNuT: Latent Code Execution via Continuous Thought Chains" (2026)
Built with [COCONUT](https://github.com/facebookresearch/coconut) (Meta FAIR, 2024)
## License
CC0 1.0 Universal
提供机构:
ArnoldMoya



