five

tgupj/tiny-router-data

收藏
Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/tgupj/tiny-router-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification language: - en tags: - agent - synthetic size_categories: - 1K<n<10K --- # tiny-router dataset Synthetic data for training and evaluating `tiny-router`, a compact multi-head classifier for short routing decisions. Each example contains a `current_text` field, optional interaction context, and four labels: - `relation_to_previous`: `new`, `follow_up`, `correction`, `confirmation`, `cancellation`, `closure` - `actionability`: `none`, `review`, `act` - `retention`: `ephemeral`, `useful`, `remember` - `urgency`: `low`, `medium`, `high` ## Files - `raw/synthetic.jsonl`: 2,907 synthetic examples before deduplication - `raw/synthetic.deduped.jsonl`: 2,892 deduplicated examples - `synthetic/train.jsonl`: 2,279 examples - `synthetic/validation.jsonl`: 276 examples - `synthetic/test.jsonl`: 337 examples The train, validation, and test splits are derived from the deduplicated file. About 37% of examples have no interaction context and only include `current_text` plus labels. ## Schema ```json { "current_text": "Actually next Monday", "interaction": { "previous_text": "Set a reminder for Friday", "previous_action": "created_reminder", "previous_outcome": "success", "recency_seconds": 45 }, "labels": { "relation_to_previous": "correction", "actionability": "act", "retention": "useful", "urgency": "medium" } } ``` `interaction` is optional and may be `null` or omitted. ## Intended use This dataset is meant for lightweight research and prototyping around message routing, update handling, memory policy, and prioritization for short text inputs. ## Limitations - The data is synthetic, not collected from production traffic. - Labels reflect the routing schema used in this repository and may not transfer cleanly to other products. - It should not be used on its own for high-stakes or fully autonomous decisions.

### 数据集元信息 许可证:MIT协议 任务类别:文本分类(text-classification) 语言:英语(en) 标签:AI智能体(agent)、合成数据(synthetic) 样本量级:1000 < 样本数 < 10000 # tiny-router 数据集 本数据集用于训练与评估`tiny-router`——一款面向短文本路由决策的紧凑型多头分类器。 每条样本均包含`current_text`字段、可选的交互上下文,以及四类标签: - `relation_to_previous`:可选值为`new`(新内容)、`follow_up`(跟进)、`correction`(修正)、`confirmation`(确认)、`cancellation`(取消)、`closure`(收尾) - `actionability`:可选值为`none`(无操作)、`review`(审核)、`act`(执行) - `retention`:可选值为`ephemeral`(临时留存)、`useful`(有效留存)、`remember`(持久留存) - `urgency`:可选值为`low`(低优先级)、`medium`(中优先级)、`high`(高优先级) ## 文件说明 - `raw/synthetic.jsonl`:去重前共包含2907条合成样本 - `raw/synthetic.deduped.jsonl`:去重后共包含2892条合成样本 - `synthetic/train.jsonl`:训练集,含2279条样本 - `synthetic/validation.jsonl`:验证集,含276条样本 - `synthetic/test.jsonl`:测试集,含337条样本 训练集、验证集与测试集均源自去重后的样本文件,其中约37%的样本无交互上下文,仅包含`current_text`字段与对应标签。 ## 数据格式 json { "current_text": "Actually next Monday", "interaction": { "previous_text": "Set a reminder for Friday", "previous_action": "created_reminder", "previous_outcome": "success", "recency_seconds": 45 }, "labels": { "relation_to_previous": "correction", "actionability": "act", "retention": "useful", "urgency": "medium" } } `interaction`为可选字段,可设为`null`或直接省略。 ## 适用场景 本数据集旨在支持针对短文本输入的消息路由、更新处理、内存策略与优先级排序相关的轻量化研究与原型开发。 ## 局限性说明 - 本数据集为合成生成数据,并非采集自真实生产流量。 - 标签仅适配本仓库所用的路由框架,无法直接迁移至其他产品场景。 - 不得单独将本数据集用于高风险或完全自主的决策场景。
提供机构:
tgupj
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作