five

hololivefarm/curated-instruction-pairs

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/hololivefarm/curated-instruction-pairs
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en size_categories: - 1K<n<10K tags: - instruction-following - prompt-completion - coding - reasoning - security - curated pretty_name: Curated Instruction-Following Pairs dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: category dtype: string - name: difficulty dtype: string splits: - name: train num_examples: 1025 configs: - config_name: default data_files: - split: train path: data/train.jsonl --- # Curated Instruction-Following Pairs A dataset of 1,025 high-quality instruction/completion pairs spanning coding, writing, reasoning, data science, DevOps, security, and creative tasks. Every example features detailed, production-grade responses modeled after real Stack Overflow answers, LeetCode solutions, official documentation, and professional technical writing. ## Dataset Description Each example contains: | Field | Description | |-------|-------------| | `instruction` | The task or question posed to the model | | `input` | Optional additional context (empty string if none) | | `output` | A detailed, high-quality completion | | `category` | One of: `coding`, `writing`, `reasoning`, `data`, `devops`, `creative`, `security` | | `difficulty` | One of: `easy`, `medium`, `hard` | ## Usage ```python from datasets import load_dataset ds = load_dataset("hololivefarm/curated-instruction-pairs") print(ds["train"][0]) ``` ## Category Distribution | Category | Count | Topics | |----------|-------|--------| | **coding** | 427 | Python, JavaScript/TypeScript, Go, Rust, SQL, Bash, C, algorithms, data structures, design patterns, async, concurrency, system design, software architecture | | **reasoning** | 165 | Logic puzzles, Fermi estimation, probability, game theory, trade-off analysis, critical thinking, paradoxes, business strategy, CS fundamentals | | **data** | 100 | ML algorithms, deep learning, statistics, feature engineering, model evaluation, A/B testing, NLP, pandas, visualization, MLOps | | **writing** | 99 | Technical editing, emails, commit messages, incident postmortems, bug reports, PR descriptions, release notes, code review, proposals | | **devops** | 86 | Docker, Kubernetes, CI/CD, Terraform, nginx, monitoring, Linux admin, networking, Git, database ops, cloud architecture | | **security** | 80 | OWASP Top 10, XSS, SQL injection, CORS, JWT, OAuth, cryptography, password storage, API security, infrastructure hardening | | **creative** | 68 | Naming, documentation, API docs, technical comparisons, blog posts, job descriptions, user-facing copy, data storytelling | **Difficulty distribution:** 254 easy, 510 medium, 261 hard ## Languages Covered Python, JavaScript, TypeScript, Go, Rust, SQL, Bash, C, HTML/CSS, YAML (Kubernetes, Docker Compose, GitHub Actions), HCL (Terraform), nginx config, Cypher (Neo4j) ## Format The dataset follows the Alpaca instruction-input-output format, making it compatible with most instruction-tuning pipelines. Available in three formats: - `data/train.jsonl` — JSON Lines (recommended) - `data/train.json` — JSON array - `data/train.csv` — CSV ## License MIT
提供机构:
hololivefarm
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作