five

AmanPriyanshu/reasoning-sft-Nemotron-Instruction-Following-Chat-v1

收藏
Hugging Face2026-03-14 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/reasoning-sft-Nemotron-Instruction-Following-Chat-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: - cc-by-4.0 task_categories: - text-generation language: - en tags: - reasoning - sft - chain-of-thought - instruction-following - structured-outputs - chat size_categories: - 100K<n<1M --- # Nemotron Instruction Following Chat v1 (Reasoning SFT) Converted version of [nvidia/Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1), filtered to 157,595 rows where assistant responses include genuine reasoning traces (`reasoning_content`). ## Format Each row has three columns: - **`input`** — list of dicts with role/content conversation turns (system, user, and prior assistant turns up to the final assistant response) - **`response`** — `<think>` block containing the model's reasoning followed by the final answer - **`domain`** — task domain: `instruction_following` or `structured_outputs` ## Domain Distribution | Domain | Rows | |--------|------| | instruction_following | 152,628 | | structured_outputs | 4,967 | ## Conversion - Source: both `chat_if` (426K rows) and `structured_outputs` (5K rows) splits - Filtered to rows where the last assistant message contains non-empty `reasoning_content` (36.57% of total) - Reasoning mapped into `<think>` blocks, answer follows after `</think>` - Validated exactly 1 open and 1 close think tag per response - Multi-turn conversations preserved: all prior turns (system, user, assistant) become input context ## Usage ```py from huggingface_hub import hf_hub_download import pyarrow.parquet as pq import random repo = "AmanPriyanshu/reasoning-sft-Nemotron-Instruction-Following-Chat-v1" path = hf_hub_download(repo_id=repo, filename="data.parquet", repo_type="dataset") table = pq.read_table(path) print(f"Loaded {len(table):,} rows\n") i = random.randint(0, len(table) - 1) row = {col: table.column(col)[i].as_py() for col in table.schema.names} print(f"=== ROW (index {i}) ===") print(f"\n[domain] {row['domain']}") print(f"\n[input] ({len(row['input'])} turns)") for t in row["input"]: preview = t["content"][:300] + ("..." if len(t["content"]) > 300 else "") print(f" {t['role']}: {preview}") rp = row["response"][:1500] if len(row["response"]) > 1500: rp += "..." print(f"\n[response]\n{rp}") ``` ## License Inherited from the [original dataset](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1) by NVIDIA Corporation.
提供机构:
AmanPriyanshu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作