five

8F-ai/EvalLLM-30000X

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/8F-ai/EvalLLM-30000X
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en pretty_name: EvalLLM-30000x size_categories: - 10K<n<100K task_categories: - text-generation - question-answering - summarization tags: - synthetic - instruction-tuning - evaluation - alignment --- # Dataset Card for EvalLLM-30000x ## Dataset Summary This repository provides a 30,000-example synthetic instruction dataset designed for high-signal supervised training, evaluator calibration, and response quality benchmarking. The collection is organized into six focused subsets: 1. `analysis`: operational diagnosis, tradeoff analysis, and recommendation writing. 2. `coding`: compact implementation tasks with maintainable code-first answers. 3. `grounded`: evidence-faithful question answering from short source excerpts. 4. `safety`: refusal-and-redirection examples for harmful or disallowed requests. 5. `structured`: schema-following extraction and JSON formatting tasks. 6. `writing`: polished drafting, rewriting, and concise stakeholder communication. The dataset is intended to feel closer to frontier-style assistant behavior than bulk template dumps: prompts are constraint-aware, responses are direct, and exact prompt duplication is capped at no more than two occurrences. In the generated release in this repository, the duplicate check passes with the expected 30,000 total training rows. ## Data Quality Goals The training data were generated to emphasize: - diverse task framing instead of repeated single-turn trivia - clear constraints and audience-aware answers - concise but production-usable code and writing samples - grounded answers that stay within provided evidence - safe refusals that remain helpful without enabling harm This is a synthetic dataset. It should be reviewed and filtered for any production use, especially if you plan to fine-tune a public-facing assistant. ## Data Structure Each subset lives in its own subdirectory under `data/` and is sharded into two parquet files: ```text data/ analysis/ train-00000-of-00002.parquet train-00001-of-00002.parquet coding/ train-00000-of-00002.parquet train-00001-of-00002.parquet grounded/ train-00000-of-00002.parquet train-00001-of-00002.parquet safety/ train-00000-of-00002.parquet train-00001-of-00002.parquet structured/ train-00000-of-00002.parquet train-00001-of-00002.parquet writing/ train-00000-of-00002.parquet train-00001-of-00002.parquet ``` Each parquet row contains the following fields: - `id`: stable example identifier - `subset`: subset name - `split`: currently `train` - `task_family`: task grouping label - `difficulty`: coarse difficulty marker - `language`: response language - `instruction`: user-facing task description - `input`: supporting context or source text - `output`: target answer - `quality_signals`: short quality annotations - `safety_tags`: safety metadata ## Usage Load the full dataset: ```python from datasets import load_dataset dataset = load_dataset("8F-ai/EvalLLM-30000x") ``` Load a single subset: ```python from datasets import load_dataset coding = load_dataset("8F-ai/EvalLLM-30000x", data_dir="coding") ``` ## Limitations - Synthetic data can still reflect template bias even when it is diverse. - The dataset is English-only in this release. - Safety examples are designed to be policy-aligned, but they are not a substitute for runtime safeguards. ## Recommended Uses - supervised fine-tuning warm starts - evaluator or reward-model prototyping - response-style benchmarking - task-routing and subset-specific ablation studies

--- 许可证: MIT 语言: - 英语 美观名称: EvalLLM-30000x 样本规模类别: - 10K<n<100K 任务类别: - 文本生成 - 问答 - 摘要 标签: - 合成数据 - 指令微调 - 评估 - 对齐 --- # EvalLLM-30000x 数据集卡片 ## 数据集概述 本仓库提供一个包含30000条样本的合成指令数据集,旨在用于高信噪比监督训练、评估器校准与回复质量基准测试。该数据集共分为六个聚焦型子集: 1. `analysis`:运维诊断、权衡分析与推荐文稿撰写 2. `coding`:紧凑实现任务,附带可维护的代码优先型回复 3. `grounded`:基于短文本源片段生成符合证据一致性的问答回复 4. `safety`:针对有害或违规请求的拒绝与重定向示例 5. `structured`:遵循schema的信息抽取与JSON格式转换任务 6. `writing`:精美的文稿起草、改写与简洁的利益相关方沟通内容撰写 相较于批量模板堆砌的数据集,本数据集更贴近前沿智能助手的交互行为:提示词具备约束感知能力,回复直截了当,且完全相同的提示词重复次数不超过两次。本仓库发布的版本已通过重复检查,最终包含预期的30000条训练样本。 ## 数据质量目标 本训练数据生成过程着重强调以下要点: - 多样化的任务框架,而非重复的单轮琐碎内容 - 清晰的约束条件与适配受众的回复 - 简洁且可投入生产使用的代码与写作样例 - 严格遵循给定源证据的一致性回复 - 保持友好且避免助长危害的安全拒绝回复 本数据集为合成生成数据。若将其用于生产环境,尤其是用于微调面向公众的智能助手时,需对数据进行审查与过滤。 ## 数据结构 每个子集均存放在`data/`目录下的独立子目录中,并被切分为两个Parquet文件: text data/ analysis/ train-00000-of-00002.parquet train-00001-of-00002.parquet coding/ train-00000-of-00002.parquet train-00001-of-00002.parquet grounded/ train-00000-of-00002.parquet train-00001-of-00002.parquet safety/ train-00000-of-00002.parquet train-00001-of-00002.parquet structured/ train-00000-of-00002.parquet train-00001-of-00002.parquet writing/ train-00000-of-00002.parquet train-00001-of-00002.parquet 每个Parquet行包含以下字段: - `id`:稳定的样本标识符 - `subset`:子集名称 - `split`:当前为`train`拆分集 - `task_family`:任务分组标签 - `difficulty`:粗粒度难度标记 - `language`:回复语言 - `instruction`:面向用户的任务描述 - `input`:辅助上下文或源文本 - `output`:目标回复 - `quality_signals`:简短质量标注 - `safety_tags`:安全元数据 ## 使用方法 加载完整数据集: python from datasets import load_dataset dataset = load_dataset("8F-ai/EvalLLM-30000x") 加载单个子集: python from datasets import load_dataset coding = load_dataset("8F-ai/EvalLLM-30000x", data_dir="coding") ## 局限性 - 即使合成数据具备多样性,仍可能反映模板固有的偏差 - 本版本数据集仅支持英语 - 安全示例旨在与政策对齐,但无法替代运行时安全防护机制 ## 推荐应用场景 - 监督微调预热启动 - 评估器或奖励模型原型开发 - 回复风格基准测试 - 任务路由与子集特定的消融实验研究
提供机构:
8F-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作