five

AmanPriyanshu/reasoning-sft-JSON-structuring-and-correcting

收藏
Hugging Face2026-03-10 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/reasoning-sft-JSON-structuring-and-correcting
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation - question-answering language: - en - ja tags: - reasoning - sft - chain-of-thought - tool-calling - structured-output - json - json-repair - agent size_categories: - 100K<n<1M --- # JSON Structuring and Correcting (Reasoning SFT) Combined dataset of 508K rows for training LLMs on structured output tasks with reasoning traces, sourced from two datasets: ## Sources ### tool_calling.parquet (488,461 rows) Converted from [vericava/sft-tool-calling-structured-output-v1](https://huggingface.co/datasets/vericava/sft-tool-calling-structured-output-v1). Multi-turn tool calling and structured output tasks including tool invocations, tool results, and final assistant responses. Includes English and Japanese content. ### json_repair.parquet (19,645 rows) Converted from [kth8/json-repair](https://huggingface.co/datasets/kth8/json-repair). JSON formatting and repair tasks where the model corrects malformed JSON into valid, properly indented output. ## Format Each row has three columns: - **`input`** — list of dicts with role/content conversation turns (system prompt includes tools/schema where applicable) - **`response`** — response string with `<think>` reasoning block followed by the structured output - **`source`** — `N/A` ## Conversion - Tool-calling: system prompt combines model identity + available tools + output schema; tool calls and results preserved as context turns; 1,322 rows dropped (empty targets) - JSON repair: system + user turns as input, assistant response as output; 0 rows dropped - Both: short task-focused reasoning injected in think blocks; validated exactly 1 open and 1 close think tag per response ## Usage ```py from huggingface_hub import hf_hub_download import pyarrow.parquet as pq import random repo = "AmanPriyanshu/reasoning-sft-JSON-structuring-and-correcting" for name in ["tool_calling.parquet", "json_repair.parquet"]: path = hf_hub_download(repo_id=repo, filename=name, repo_type="dataset") table = pq.read_table(path) print(f"{'='*80}") print(f"{name}: {len(table)} rows") print(f"{'='*80}") i = random.randint(0, len(table) - 1) row = {col: table.column(col)[i].as_py() for col in table.schema.names} print(f"\n[source] {row['source']}") print(f"\n[input] ({len(row['input'])} turns)") for t in row["input"]: preview = t["content"][:250] + ("..." if len(t["content"]) > 250 else "") print(f" {t['role']}: {preview}") rp = row["response"][:800] if len(row["response"]) > 800: rp += "..." print(f"\n[response]\n{rp}\n") ``` ## License Apache 2.0 — inherited from both source datasets.
提供机构:
AmanPriyanshu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作