five

ihounie/when2call_imbalanced_toolcall

收藏
Hugging Face2026-03-17 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ihounie/when2call_imbalanced_toolcall
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: when2call_imbalanced_toolcall configs: - config_name: train_pref data_files: - split: train path: train-* license: mit language: - en tags: - when2call - preference-dataset - class-imbalance - synthetic-sampling size_categories: - 1K<n<10K --- # when2call_imbalanced_toolcall This dataset is derived from `nvidia/When2Call` (`train_pref`, `train` split) by downsampling one chosen-response category to ~50% while keeping all other rows. ## Source - Dataset: `nvidia/When2Call` - Config: `train_pref` - Split: `train` - Source rows: 9000 ## Classification Rules (on `chosen_response`) Categories are assigned in this precedence order: 1. `toolcall` if text contains `<TOOLCALL>` (case-insensitive) 2. `request` if text contains `?` 3. `request` if text contains one of: - `To proceed,` - `Please provide` - `Please specify` (case-insensitive) 4. `refusal` if text contains one of: - `apologies` - `apologize` - `sorry` - `I'm unable` (including escaped/quoted variants) - `I'm afraid` (case-insensitive) 5. otherwise `unk` ## Sampling Procedure - Target minority class: `toolcall` - Keep ratio for target class: 50% (floor when odd) - Random seed: 43 - Other classes: all rows kept ## Class Counts (chosen_response) ### Before sampling - refusal: 2999 - toolcall: 3000 - request: 3001 - unk: 0 ### After sampling - refusal: 2999 - toolcall: 1500 - request: 3001 - unk: 0 ## Rows - Final rows: 7500 ## Notes - The schema/columns match the source `train_pref` split format. - This repo contains only the `train_pref`/`train` data.
提供机构:
ihounie
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作