AmanPriyanshu/reasoning-sft-JSON-structuring-and-correcting

Name: AmanPriyanshu/reasoning-sft-JSON-structuring-and-correcting
Creator: AmanPriyanshu
Published: 2026-03-10 20:55:36
License: 暂无描述

Hugging Face2026-03-10 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AmanPriyanshu/reasoning-sft-JSON-structuring-and-correcting

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation - question-answering language: - en - ja tags: - reasoning - sft - chain-of-thought - tool-calling - structured-output - json - json-repair - agent size_categories: - 100K<n<1M --- # JSON Structuring and Correcting (Reasoning SFT) Combined dataset of 508K rows for training LLMs on structured output tasks with reasoning traces, sourced from two datasets: ## Sources ### tool_calling.parquet (488,461 rows) Converted from [vericava/sft-tool-calling-structured-output-v1](https://huggingface.co/datasets/vericava/sft-tool-calling-structured-output-v1). Multi-turn tool calling and structured output tasks including tool invocations, tool results, and final assistant responses. Includes English and Japanese content. ### json_repair.parquet (19,645 rows) Converted from [kth8/json-repair](https://huggingface.co/datasets/kth8/json-repair). JSON formatting and repair tasks where the model corrects malformed JSON into valid, properly indented output. ## Format Each row has three columns: - **`input`** — list of dicts with role/content conversation turns (system prompt includes tools/schema where applicable) - **`response`** — response string with `<think>` reasoning block followed by the structured output - **`source`** — `N/A` ## Conversion - Tool-calling: system prompt combines model identity + available tools + output schema; tool calls and results preserved as context turns; 1,322 rows dropped (empty targets) - JSON repair: system + user turns as input, assistant response as output; 0 rows dropped - Both: short task-focused reasoning injected in think blocks; validated exactly 1 open and 1 close think tag per response ## Usage ```py from huggingface_hub import hf_hub_download import pyarrow.parquet as pq import random repo = "AmanPriyanshu/reasoning-sft-JSON-structuring-and-correcting" for name in ["tool_calling.parquet", "json_repair.parquet"]: path = hf_hub_download(repo_id=repo, filename=name, repo_type="dataset") table = pq.read_table(path) print(f"{'='*80}") print(f"{name}: {len(table)} rows") print(f"{'='*80}") i = random.randint(0, len(table) - 1) row = {col: table.column(col)[i].as_py() for col in table.schema.names} print(f"\n[source] {row['source']}") print(f"\n[input] ({len(row['input'])} turns)") for t in row["input"]: preview = t["content"][:250] + ("..." if len(t["content"]) > 250 else "") print(f" {t['role']}: {preview}") rp = row["response"][:800] if len(row["response"]) > 800: rp += "..." print(f"\n[response]\n{rp}\n") ``` ## License Apache 2.0 — inherited from both source datasets.

提供机构：

AmanPriyanshu

5,000+

优质数据集

54 个

任务类型

进入经典数据集