AmanPriyanshu/reasoning-sft-Nemotron-Cascade-SFT-SWE-210K

Name: AmanPriyanshu/reasoning-sft-Nemotron-Cascade-SFT-SWE-210K
Creator: AmanPriyanshu
Published: 2026-02-27 00:45:16
License: 暂无描述

Hugging Face2026-02-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AmanPriyanshu/reasoning-sft-Nemotron-Cascade-SFT-SWE-210K

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-generation - question-answering language: - en tags: - reasoning - sft - chain-of-thought - code - swe-bench - software-engineering pretty_name: reasoning-sft-Nemotron-Cascade-SFT-SWE-210K size_categories: - 100K<n<1M --- # reasoning-sft-Nemotron-Cascade-SFT-SWE-210K Converted version of [nvidia/Nemotron-Cascade-SFT-SWE](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-SWE), filtered to `thinking=True` rows with exactly one valid `<think>...</think>` block. ## Format Each row has three columns: - `input` — list of dicts (conversation turns with `role` and `content`, system messages dropped, last assistant message removed) - `response` — assistant response string including `<think>` reasoning block - `domain` — `{category}_{source}` (e.g. `SWE Repair_SWE-Fixer-Train`) ## Stats - Original rows: ~214K - Converted (thinking=True, clean think tags): ~210K ## Usage ``` import random import pyarrow.parquet as pq from huggingface_hub import hf_hub_download fpath = hf_hub_download( repo_id="AmanPriyanshu/reasoning-sft-Nemotron-Cascade-SFT-SWE-210K", repo_type="dataset", filename="data_converted.parquet", local_dir="./tmp_swe_peek" ) pf = pq.ParquetFile(fpath) rows = {"input": [], "response": [], "domain": []} for batch in pf.iter_batches(batch_size=65_536): d = batch.to_pydict() rows["input"].extend(d["input"]) rows["response"].extend(d["response"]) rows["domain"].extend(d["domain"]) total = len(rows["input"]) for idx in random.sample(range(total), 3): print(f"\n{'='*80}\nRow {idx:,} / {total:,} | domain: {rows['domain'][idx]}\n{'='*80}") for msg in rows["input"][idx]: print(f"\n [{msg['role']}]\n {msg['content'][:300]}{'...' if len(msg['content']) > 300 else ''}") print(f"\n[response]\n{rows['response'][idx][:600]}{'...' if len(rows['response'][idx]) > 600 else ''}") ``` ## License [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) ## Credits Original dataset: [nvidia/Nemotron-Cascade-SFT-SWE](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-SWE) by NVIDIA Responses generated by DeepSeek-R1-0528

提供机构：

AmanPriyanshu

5,000+

优质数据集

54 个

任务类型

进入经典数据集