8F-ai/EvalLLM-30000X
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/8F-ai/EvalLLM-30000X
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
pretty_name: EvalLLM-30000x
size_categories:
- 10K<n<100K
task_categories:
- text-generation
- question-answering
- summarization
tags:
- synthetic
- instruction-tuning
- evaluation
- alignment
---
# Dataset Card for EvalLLM-30000x
## Dataset Summary
This repository provides a 30,000-example synthetic instruction dataset designed for high-signal supervised training, evaluator calibration, and response quality benchmarking. The collection is organized into six focused subsets:
1. `analysis`: operational diagnosis, tradeoff analysis, and recommendation writing.
2. `coding`: compact implementation tasks with maintainable code-first answers.
3. `grounded`: evidence-faithful question answering from short source excerpts.
4. `safety`: refusal-and-redirection examples for harmful or disallowed requests.
5. `structured`: schema-following extraction and JSON formatting tasks.
6. `writing`: polished drafting, rewriting, and concise stakeholder communication.
The dataset is intended to feel closer to frontier-style assistant behavior than bulk template dumps: prompts are constraint-aware, responses are direct, and exact prompt duplication is capped at no more than two occurrences. In the generated release in this repository, the duplicate check passes with the expected 30,000 total training rows.
## Data Quality Goals
The training data were generated to emphasize:
- diverse task framing instead of repeated single-turn trivia
- clear constraints and audience-aware answers
- concise but production-usable code and writing samples
- grounded answers that stay within provided evidence
- safe refusals that remain helpful without enabling harm
This is a synthetic dataset. It should be reviewed and filtered for any production use, especially if you plan to fine-tune a public-facing assistant.
## Data Structure
Each subset lives in its own subdirectory under `data/` and is sharded into two parquet files:
```text
data/
analysis/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
coding/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
grounded/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
safety/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
structured/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
writing/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
```
Each parquet row contains the following fields:
- `id`: stable example identifier
- `subset`: subset name
- `split`: currently `train`
- `task_family`: task grouping label
- `difficulty`: coarse difficulty marker
- `language`: response language
- `instruction`: user-facing task description
- `input`: supporting context or source text
- `output`: target answer
- `quality_signals`: short quality annotations
- `safety_tags`: safety metadata
## Usage
Load the full dataset:
```python
from datasets import load_dataset
dataset = load_dataset("8F-ai/EvalLLM-30000x")
```
Load a single subset:
```python
from datasets import load_dataset
coding = load_dataset("8F-ai/EvalLLM-30000x", data_dir="coding")
```
## Limitations
- Synthetic data can still reflect template bias even when it is diverse.
- The dataset is English-only in this release.
- Safety examples are designed to be policy-aligned, but they are not a substitute for runtime safeguards.
## Recommended Uses
- supervised fine-tuning warm starts
- evaluator or reward-model prototyping
- response-style benchmarking
- task-routing and subset-specific ablation studies
---
许可证: MIT
语言:
- 英语
美观名称: EvalLLM-30000x
样本规模类别:
- 10K<n<100K
任务类别:
- 文本生成
- 问答
- 摘要
标签:
- 合成数据
- 指令微调
- 评估
- 对齐
---
# EvalLLM-30000x 数据集卡片
## 数据集概述
本仓库提供一个包含30000条样本的合成指令数据集,旨在用于高信噪比监督训练、评估器校准与回复质量基准测试。该数据集共分为六个聚焦型子集:
1. `analysis`:运维诊断、权衡分析与推荐文稿撰写
2. `coding`:紧凑实现任务,附带可维护的代码优先型回复
3. `grounded`:基于短文本源片段生成符合证据一致性的问答回复
4. `safety`:针对有害或违规请求的拒绝与重定向示例
5. `structured`:遵循schema的信息抽取与JSON格式转换任务
6. `writing`:精美的文稿起草、改写与简洁的利益相关方沟通内容撰写
相较于批量模板堆砌的数据集,本数据集更贴近前沿智能助手的交互行为:提示词具备约束感知能力,回复直截了当,且完全相同的提示词重复次数不超过两次。本仓库发布的版本已通过重复检查,最终包含预期的30000条训练样本。
## 数据质量目标
本训练数据生成过程着重强调以下要点:
- 多样化的任务框架,而非重复的单轮琐碎内容
- 清晰的约束条件与适配受众的回复
- 简洁且可投入生产使用的代码与写作样例
- 严格遵循给定源证据的一致性回复
- 保持友好且避免助长危害的安全拒绝回复
本数据集为合成生成数据。若将其用于生产环境,尤其是用于微调面向公众的智能助手时,需对数据进行审查与过滤。
## 数据结构
每个子集均存放在`data/`目录下的独立子目录中,并被切分为两个Parquet文件:
text
data/
analysis/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
coding/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
grounded/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
safety/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
structured/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
writing/
train-00000-of-00002.parquet
train-00001-of-00002.parquet
每个Parquet行包含以下字段:
- `id`:稳定的样本标识符
- `subset`:子集名称
- `split`:当前为`train`拆分集
- `task_family`:任务分组标签
- `difficulty`:粗粒度难度标记
- `language`:回复语言
- `instruction`:面向用户的任务描述
- `input`:辅助上下文或源文本
- `output`:目标回复
- `quality_signals`:简短质量标注
- `safety_tags`:安全元数据
## 使用方法
加载完整数据集:
python
from datasets import load_dataset
dataset = load_dataset("8F-ai/EvalLLM-30000x")
加载单个子集:
python
from datasets import load_dataset
coding = load_dataset("8F-ai/EvalLLM-30000x", data_dir="coding")
## 局限性
- 即使合成数据具备多样性,仍可能反映模板固有的偏差
- 本版本数据集仅支持英语
- 安全示例旨在与政策对齐,但无法替代运行时安全防护机制
## 推荐应用场景
- 监督微调预热启动
- 评估器或奖励模型原型开发
- 回复风格基准测试
- 任务路由与子集特定的消融实验研究
提供机构:
8F-ai



