tinaxie/Uno-Curriculum
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/tinaxie/Uno-Curriculum
下载链接
链接失效反馈官方服务:
资源简介:
Uno-Curriculum数据集是一个用于训练分层委托路由器的语料库,该路由器是一个小型语言模型,能够将任务分解为子任务,并将每个子任务路由到特定的(工作模型,技能)对。数据集中的每一行都来自真实的公开HuggingFace数据集,问题和答案直接从源数据集中采样。每行数据经过三个阶段的处理流程(路由器探测→教师轨迹→噪声去除),以确保多轮对话轨迹的质量。数据集包含多种配置,适用于不同的使用场景,如主要训练、真实部署审计和子任务级路由分析。数据集主要为英文,采用Apache-2.0许可证。
The Uno-Curriculum dataset is a training corpus for a hierarchical-delegation router: a small language model that decomposes a task into subtasks and routes each subtask to a (worker model, skill) pair. Every row comes from a real public HuggingFace dataset, with questions and answers sampled verbatim from the source dataset. Each row undergoes a three-stage pipeline (router probe → teacher trajectory → noise removal) to ensure the quality of multi-turn trajectories. The dataset includes various configurations for different use cases, such as primary training, real-rollout audits, and subtask-level routing analysis. The dataset is primarily in English and is licensed under Apache-2.0.
提供机构:
tinaxie



