hololivefarm/curated-instruction-pairs
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/hololivefarm/curated-instruction-pairs
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
language:
- en
size_categories:
- 1K<n<10K
tags:
- instruction-following
- prompt-completion
- coding
- reasoning
- security
- curated
pretty_name: Curated Instruction-Following Pairs
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: category
dtype: string
- name: difficulty
dtype: string
splits:
- name: train
num_examples: 1025
configs:
- config_name: default
data_files:
- split: train
path: data/train.jsonl
---
# Curated Instruction-Following Pairs
A dataset of 1,025 high-quality instruction/completion pairs spanning coding, writing, reasoning, data science, DevOps, security, and creative tasks. Every example features detailed, production-grade responses modeled after real Stack Overflow answers, LeetCode solutions, official documentation, and professional technical writing.
## Dataset Description
Each example contains:
| Field | Description |
|-------|-------------|
| `instruction` | The task or question posed to the model |
| `input` | Optional additional context (empty string if none) |
| `output` | A detailed, high-quality completion |
| `category` | One of: `coding`, `writing`, `reasoning`, `data`, `devops`, `creative`, `security` |
| `difficulty` | One of: `easy`, `medium`, `hard` |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("hololivefarm/curated-instruction-pairs")
print(ds["train"][0])
```
## Category Distribution
| Category | Count | Topics |
|----------|-------|--------|
| **coding** | 427 | Python, JavaScript/TypeScript, Go, Rust, SQL, Bash, C, algorithms, data structures, design patterns, async, concurrency, system design, software architecture |
| **reasoning** | 165 | Logic puzzles, Fermi estimation, probability, game theory, trade-off analysis, critical thinking, paradoxes, business strategy, CS fundamentals |
| **data** | 100 | ML algorithms, deep learning, statistics, feature engineering, model evaluation, A/B testing, NLP, pandas, visualization, MLOps |
| **writing** | 99 | Technical editing, emails, commit messages, incident postmortems, bug reports, PR descriptions, release notes, code review, proposals |
| **devops** | 86 | Docker, Kubernetes, CI/CD, Terraform, nginx, monitoring, Linux admin, networking, Git, database ops, cloud architecture |
| **security** | 80 | OWASP Top 10, XSS, SQL injection, CORS, JWT, OAuth, cryptography, password storage, API security, infrastructure hardening |
| **creative** | 68 | Naming, documentation, API docs, technical comparisons, blog posts, job descriptions, user-facing copy, data storytelling |
**Difficulty distribution:** 254 easy, 510 medium, 261 hard
## Languages Covered
Python, JavaScript, TypeScript, Go, Rust, SQL, Bash, C, HTML/CSS, YAML (Kubernetes, Docker Compose, GitHub Actions), HCL (Terraform), nginx config, Cypher (Neo4j)
## Format
The dataset follows the Alpaca instruction-input-output format, making it compatible with most instruction-tuning pipelines.
Available in three formats:
- `data/train.jsonl` — JSON Lines (recommended)
- `data/train.json` — JSON array
- `data/train.csv` — CSV
## License
MIT
提供机构:
hololivefarm



