five

btzsc/btzsc

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/btzsc/btzsc
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: agnews features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 9630696 num_examples: 30400 download_size: 1280949 dataset_size: 9630696 - config_name: all features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: task_name dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 1030704708 num_examples: 2222983 download_size: 64153380 dataset_size: 1030704708 - config_name: amazonpolarity features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 10798222 num_examples: 20000 download_size: 2974010 dataset_size: 10798222 - config_name: appreviews features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 2414054 num_examples: 8000 download_size: 566905 dataset_size: 2414054 - config_name: banking77 features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 40018400 num_examples: 221760 download_size: 804682 dataset_size: 40018400 - config_name: biasframes_intent features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 1592094 num_examples: 7296 download_size: 310428 dataset_size: 1592094 - config_name: biasframes_offensive features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 1785704 num_examples: 7676 download_size: 327567 dataset_size: 1785704 - config_name: biasframes_sex features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 1830030 num_examples: 8808 download_size: 379857 dataset_size: 1830030 - config_name: capsotu features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 24646828 num_examples: 70455 download_size: 723183 dataset_size: 24646828 - config_name: emotion features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: task_name dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 54342486 num_examples: 93344 download_size: 1249373 dataset_size: 54342486 - config_name: emotiondair features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 2202560 num_examples: 12000 download_size: 158115 dataset_size: 2202560 - config_name: empathetic features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 52139926 num_examples: 81344 download_size: 1092730 dataset_size: 52139926 - config_name: financialphrasebank features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 514854 num_examples: 2070 download_size: 65448 dataset_size: 514854 - config_name: imdb features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 27862150 num_examples: 20000 download_size: 8559151 dataset_size: 27862150 - config_name: intent features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: task_name dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 65522268 num_examples: 404522 download_size: 1669284 dataset_size: 65522268 - config_name: manifesto features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 417565056 num_examples: 953008 download_size: 8569698 dataset_size: 417565056 - config_name: massive features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 23911774 num_examples: 175466 download_size: 558077 dataset_size: 23911774 - config_name: rottentomatoes features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 493664 num_examples: 2132 download_size: 95622 dataset_size: 493664 - config_name: sentiment features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: task_name dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 57771774 num_examples: 72202 download_size: 16757956 dataset_size: 57771774 - config_name: topic features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: task_name dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 853068180 num_examples: 1652915 download_size: 44471303 dataset_size: 853068180 - config_name: trueteacher features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 24821652 num_examples: 17910 download_size: 6972936 dataset_size: 24821652 - config_name: wikitoxic_insult features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 7364528 num_examples: 16854 download_size: 1724127 dataset_size: 7364528 - config_name: wikitoxic_obscene features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 7951550 num_examples: 17382 download_size: 1847410 dataset_size: 7951550 - config_name: wikitoxic_threat features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 5174652 num_examples: 10422 download_size: 1332140 dataset_size: 5174652 - config_name: wikitoxic_toxicaggregated features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 9026954 num_examples: 20000 download_size: 2024344 dataset_size: 9026954 - config_name: yahootopics features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 343270530 num_examples: 500000 download_size: 19108728 dataset_size: 343270530 - config_name: yelpreviews features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': not_entailment '1': entailment - name: dataset_id dtype: string - name: label_text dtype: string splits: - name: test num_bytes: 15688830 num_examples: 20000 download_size: 4505433 dataset_size: 15688830 configs: - config_name: agnews data_files: - split: test path: agnews/test-* - config_name: all data_files: - split: test path: all/test-* - config_name: amazonpolarity data_files: - split: test path: amazonpolarity/test-* - config_name: appreviews data_files: - split: test path: appreviews/test-* - config_name: banking77 data_files: - split: test path: banking77/test-* - config_name: biasframes_intent data_files: - split: test path: biasframes_intent/test-* - config_name: biasframes_offensive data_files: - split: test path: biasframes_offensive/test-* - config_name: biasframes_sex data_files: - split: test path: biasframes_sex/test-* - config_name: capsotu data_files: - split: test path: capsotu/test-* - config_name: emotion data_files: - split: test path: emotion/test-* - config_name: emotiondair data_files: - split: test path: emotiondair/test-* - config_name: empathetic data_files: - split: test path: empathetic/test-* - config_name: financialphrasebank data_files: - split: test path: financialphrasebank/test-* - config_name: imdb data_files: - split: test path: imdb/test-* - config_name: intent data_files: - split: test path: intent/test-* - config_name: manifesto data_files: - split: test path: manifesto/test-* - config_name: massive data_files: - split: test path: massive/test-* - config_name: rottentomatoes data_files: - split: test path: rottentomatoes/test-* - config_name: sentiment data_files: - split: test path: sentiment/test-* - config_name: topic data_files: - split: test path: topic/test-* - config_name: trueteacher data_files: - split: test path: trueteacher/test-* - config_name: wikitoxic_insult data_files: - split: test path: wikitoxic_insult/test-* - config_name: wikitoxic_obscene data_files: - split: test path: wikitoxic_obscene/test-* - config_name: wikitoxic_threat data_files: - split: test path: wikitoxic_threat/test-* - config_name: wikitoxic_toxicaggregated data_files: - split: test path: wikitoxic_toxicaggregated/test-* - config_name: yahootopics data_files: - split: test path: yahootopics/test-* - config_name: yelpreviews data_files: - split: test path: yelpreviews/test-* task_categories: - text-classification - zero-shot-classification language: - en size_categories: - 1M<n<10M tags: - zero-shot-classification - benchmark pretty_name: 'BTZSC: Benchmark for Textual Zero-Shot Classification' --- <p align="center"> <img src="https://raw.githubusercontent.com/IliasAarab/btzsc/main/docs/images/btzsc_benchmark.png" align="center" width="60%" alt="BTZSC banner"> </p> <h1 align="center">BTZSC</h1> <p align="center"> <em>A benchmark dataset for zero-shot text classification across embedding models, cross-encoders, rerankers, and LLMs.</em> </p> <p align="center"> <a href="https://github.com/IliasAarab/btzsc/tags"><img src="https://img.shields.io/github/v/tag/IliasAarab/btzsc?style=flat&color=0080ff&label=version" alt="version"></a> <a href="https://pypi.org/project/btzsc/"><img src="https://img.shields.io/pypi/pyversions/btzsc?style=flat&color=0080ff" alt="python-versions"></a> <a href="https://github.com/IliasAarab/btzsc/blob/main/LICENSE"><img src="https://img.shields.io/github/license/IliasAarab/btzsc?style=flat&color=0080ff" alt="license"></a> </p> <br> <p align="center"> <a href="#quickstart">Quickstart</a> | <a href="#configs">Configs</a> | <a href="#data-format">Data Format</a> | <a href="#evaluation">Evaluation</a> | <a href="#resources">Resources</a> | <a href="#citing">Citing</a> </p> <hr> ## Overview BTZSC is a dataset-centric benchmark suite for **textual zero-shot classification** that enables *apples-to-apples* evaluation across major model families (cross-encoders, embedding models, rerankers, and LLM-style classifiers). It contains **22 datasets** spanning four common classification tasks: **sentiment**, **topic**, **intent**, and **emotion**. ## Quickstart ```python from datasets import load_dataset # Single dataset ds = load_dataset("btzsc/btzsc", name="agnews", split="test") # Task bundle ds_sent = load_dataset("btzsc/btzsc", name="sentiment", split="test") # Full suite ds_all = load_dataset("btzsc/btzsc", name="all", split="test") ``` For high-level benchmark evaluation, use the [`btzsc` eval harness](https://github.com/IliasAarab/btzsc): ```python from btzsc import BTZSCBenchmark benchmark = BTZSCBenchmark(tasks=["sentiment", "topic"]) results = benchmark.evaluate( model="intfloat/e5-base-v2", model_type="embedding", batch_size=64, ) print(results.summary()) ``` ## Configs BTZSC is published as a single Hugging Face dataset repo with multiple **configs** (`name=...`). ### Base datasets (22) | Task | Datasets | |------|----------| | **Sentiment** | `amazonpolarity`, `imdb`, `appreviews`, `yelpreviews`, `rottentomatoes`, `financialphrasebank` | | **Emotion** | `emotiondair`, `empathetic` | | **Intent** | `banking77`, `biasframes_intent`, `massive` | | **Topic** | `agnews`, `yahootopics`, `trueteacher`, `manifesto`, `capsotu`, `biasframes_offensive`, `biasframes_sex`, `wikitoxic_insult`, `wikitoxic_obscene`, `wikitoxic_threat`, `wikitoxic_toxicaggregated` | ### Convenience bundles | Bundle | Description | |--------|-------------| | `sentiment` | All 6 sentiment datasets | | `emotion` | All 2 emotion datasets | | `intent` | All 3 intent datasets | | `topic` | All 11 topic datasets | | `all` | All 22 datasets | These bundles are concatenations of the corresponding base datasets and are provided purely for convenience (e.g., one-command evaluation). They correspond to Table 1 in the paper. ## Data Format BTZSC is provided in a **pairwise entailment format**, which makes it directly usable with NLI-style cross-encoders and provides a unified interface for other ZSC approaches. Each row corresponds to a *(text, candidate label)* pair: | Column | Description | |--------|-------------| | `text` | The input document | | `label_text` | Candidate class name (e.g. `"Business"`) | | `hypothesis` | Natural-language hypothesis built from `label_text` (e.g. `"This example news text is about business news"`) | | `labels` | Binary target: `1` = entailment (correct label), `0` = not_entailment | | `dataset_id` | Dataset identifier (e.g. `agnews`) | For each original example, BTZSC contains **one positive pair** (the true label) and **multiple negative pairs** (all other labels). ## Evaluation BTZSC follows a strict zero-shot protocol: - **Primary metric:** macro-F1 per dataset, averaged across datasets for an overall score - **Secondary metrics:** accuracy, macro-precision, macro-recall - No training or tuning on evaluation datasets - 4 task families: sentiment, topic, intent, emotion See the [paper](https://openreview.net/pdf?id=IxMryAz2p3) for full details. ## Resources - Paper (OpenReview): https://openreview.net/forum?id=IxMryAz2p3 - PDF: https://openreview.net/pdf?id=IxMryAz2p3 - Eval harness (GitHub): https://github.com/IliasAarab/btzsc - Leaderboard Space: https://huggingface.co/spaces/btzsc/btzsc-leaderboard - Leaderboard results dataset: https://huggingface.co/datasets/btzsc/btzsc-results ## Licensing BTZSC aggregates multiple public datasets; **licenses vary by source dataset**. Please cite and comply with the original dataset licenses. See Appendix A.5 in the paper for details. ## Citing ```bibtex @inproceedings{aarab2026btzsc, title = {BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers}, author = {Aarab, Ilias}, booktitle = {International Conference on Learning Representations (ICLR) 2026}, year = {2026}, note = {OpenReview PDF: https://openreview.net/pdf?id=IxMryAz2p3}, url = {https://openreview.net/forum?id=IxMryAz2p3} } ``` If you use BTZSC, please also cite the original datasets.
提供机构:
btzsc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作