five

LongHorizonReasoning/longcot

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/LongHorizonReasoning/longcot
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: LongCoT license: mit task_categories: - question-answering - other language: - en tags: - reasoning - benchmark - evaluation - programmatic-verification - contamination-detection size_categories: - 1K<n<10K configs: - config_name: all data_files: - split: easy path: data/all/easy/*.parquet - split: medium path: data/all/medium/*.parquet - split: hard path: data/all/hard/*.parquet - config_name: logic data_files: - split: easy path: data/logic/easy.parquet - split: medium path: data/logic/medium.parquet - split: hard path: data/logic/hard.parquet - config_name: cs data_files: - split: easy path: data/cs/easy.parquet - split: medium path: data/cs/medium.parquet - split: hard path: data/cs/hard.parquet - config_name: chemistry data_files: - split: easy path: data/chemistry/easy.parquet - split: medium path: data/chemistry/medium.parquet - split: hard path: data/chemistry/hard.parquet - config_name: chess data_files: - split: easy path: data/chess/easy.parquet - split: medium path: data/chess/medium.parquet - split: hard path: data/chess/hard.parquet - config_name: math data_files: - split: easy path: data/math/easy.parquet - split: medium path: data/math/medium.parquet - split: hard path: data/math/hard.parquet --- # LongCoT LongCoT is a benchmark for long-horizon reasoning across logic, computer science, chemistry, chess, and mathematics. This Hugging Face release contains the benchmark data in viewer-friendly Parquet format for browsing and loading with `datasets`. The canonical codebase, verifier, and evaluation harness live at: `https://github.com/LongHorizonReasoning/longcot` ## Overview LongCoT measures whether models can sustain coherent reasoning across long chains of thought. The benchmark focuses on problems where the difficulty comes from composition: tracking state, propagating constraints, maintaining plans, and avoiding error accumulation over long reasoning trajectories. Verification is deterministic or programmatic, depending on the domain, but verification code is not bundled in this dataset repo. ## Configs And Splits The dataset provides six configs: - `all`: all domains together - `logic` - `cs` - `chemistry` - `chess` - `math` Each config has three splits: - `easy` - `medium` - `hard` ## Usage Load the full benchmark: ```python from datasets import load_dataset ds = load_dataset("LongHorizonReasoning/longcot", "all") print(ds) print(ds["easy"][0]["question_id"]) ``` Load a single domain: ```python from datasets import load_dataset ds = load_dataset("LongHorizonReasoning/longcot", "math") print(ds["easy"][0]["question_id"]) ``` ## Data Schema Rows in this release expose a flat public schema: - `question_id`: stable question identifier - `domain`: one of `logic`, `cs`, `chemistry`, `chess`, `math` - `difficulty`: one of `easy`, `medium`, `hard` - `template`: template name - `prompt`: prompt shown to the model - `answer`: canonical answer payload serialized as JSON - `canary`: public benchmark canary GUID attached to every example ## Verification This dataset card is for the data release only. To evaluate model outputs, use the verifier and evaluation harness in the canonical repository: `https://github.com/LongHorizonReasoning/longcot` That repository contains: - question loading utilities - deterministic and programmatic verifiers - evaluation scripts - submission and leaderboard workflow ## Citation If you use LongCoT, please cite: ```bibtex @article{motwani2026longcot, title = {LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning}, author = {Motwani, Sumeet Ramesh and Nichols, Daniel and London, Charles and Li, Peggy and Pizzati, Fabio and Blake, Acer and Hammoud, Hasan and McDonald, Tavish and Naik, Akshat and Ivanova, Alesia and Baskaran, Vignesh and Laptev, Ivan and Glatt, Ruben and Ben-Nun, Tal and Torr, Philip and Jaques, Natasha and Prabhu, Ameya and Bartoldson, Brian and Kailkhura, Bhavya and Schroeder de Witt, Christian}, year = {2026}, eprint = {2604.14140}, archivePrefix = {arXiv}, primaryClass = {cs.LG}, url = {https://arxiv.org/abs/2604.14140} } ```
提供机构:
LongHorizonReasoning
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作