ihounie/when2call_imbalanced_toolcall

Name: ihounie/when2call_imbalanced_toolcall
Creator: ihounie
Published: 2026-03-17 20:04:18
License: 暂无描述

Hugging Face2026-03-17 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ihounie/when2call_imbalanced_toolcall

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: when2call_imbalanced_toolcall configs: - config_name: train_pref data_files: - split: train path: train-* license: mit language: - en tags: - when2call - preference-dataset - class-imbalance - synthetic-sampling size_categories: - 1K<n<10K --- # when2call_imbalanced_toolcall This dataset is derived from `nvidia/When2Call` (`train_pref`, `train` split) by downsampling one chosen-response category to ~50% while keeping all other rows. ## Source - Dataset: `nvidia/When2Call` - Config: `train_pref` - Split: `train` - Source rows: 9000 ## Classification Rules (on `chosen_response`) Categories are assigned in this precedence order: 1. `toolcall` if text contains `<TOOLCALL>` (case-insensitive) 2. `request` if text contains `?` 3. `request` if text contains one of: - `To proceed,` - `Please provide` - `Please specify` (case-insensitive) 4. `refusal` if text contains one of: - `apologies` - `apologize` - `sorry` - `I'm unable` (including escaped/quoted variants) - `I'm afraid` (case-insensitive) 5. otherwise `unk` ## Sampling Procedure - Target minority class: `toolcall` - Keep ratio for target class: 50% (floor when odd) - Random seed: 43 - Other classes: all rows kept ## Class Counts (chosen_response) ### Before sampling - refusal: 2999 - toolcall: 3000 - request: 3001 - unk: 0 ### After sampling - refusal: 2999 - toolcall: 1500 - request: 3001 - unk: 0 ## Rows - Final rows: 7500 ## Notes - The schema/columns match the source `train_pref` split format. - This repo contains only the `train_pref`/`train` data.

提供机构：

ihounie

5,000+

优质数据集

54 个

任务类型

进入经典数据集