five

curvedinf/small-qa-1m

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/curvedinf/small-qa-1m
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit pretty_name: small-qa-autocomplete task_categories: - text-generation task_ids: - language-modeling tags: - synthetic - autocomplete - command-line - unit-conversion - fact-lookup size_categories: - 100K<n<1M --- # small-qa-autocomplete Synthetic `prompt -> completion` dataset for training/autocomplete workflows. Prompts are plain query text designed to resemble short user search/CLI-style requests. ## Dataset Summary - task type: next-token / completion-style supervision - source: synthetic prompt permutations with deterministic and model-backed answering - total rows: 999495 ## Splits - train: 979385 - validation: 10032 - test: 10078 ## Schema Each row includes: - `id`: stable row id - `prompt`: user query text - `completion`: target completion text - `domain`: domain family - `intent_id`: prompt intent identifier - `style_id`: prompt style variant - `template_id`: generator template identifier - `metadata`: JSON string with generation metadata ## Recommended Usage ```python from datasets import load_dataset # parquet ds = load_dataset("parquet", data_files={ "train": "train.parquet", "validation": "validation.parquet", "test": "test.parquet", }) # jsonl alternative # ds = load_dataset("json", data_files={ # "train": "train.jsonl", # "validation": "validation.jsonl", # "test": "test.jsonl", # }) ``` ## Notes - This dataset is synthetic and may contain occasional noise. - It is intended as base pretraining/finetuning material for autocomplete-like behavior.
提供机构:
curvedinf
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作