btzsc/btzsc
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/btzsc/btzsc
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: agnews
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 9630696
num_examples: 30400
download_size: 1280949
dataset_size: 9630696
- config_name: all
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: task_name
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 1030704708
num_examples: 2222983
download_size: 64153380
dataset_size: 1030704708
- config_name: amazonpolarity
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 10798222
num_examples: 20000
download_size: 2974010
dataset_size: 10798222
- config_name: appreviews
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 2414054
num_examples: 8000
download_size: 566905
dataset_size: 2414054
- config_name: banking77
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 40018400
num_examples: 221760
download_size: 804682
dataset_size: 40018400
- config_name: biasframes_intent
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 1592094
num_examples: 7296
download_size: 310428
dataset_size: 1592094
- config_name: biasframes_offensive
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 1785704
num_examples: 7676
download_size: 327567
dataset_size: 1785704
- config_name: biasframes_sex
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 1830030
num_examples: 8808
download_size: 379857
dataset_size: 1830030
- config_name: capsotu
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 24646828
num_examples: 70455
download_size: 723183
dataset_size: 24646828
- config_name: emotion
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: task_name
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 54342486
num_examples: 93344
download_size: 1249373
dataset_size: 54342486
- config_name: emotiondair
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 2202560
num_examples: 12000
download_size: 158115
dataset_size: 2202560
- config_name: empathetic
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 52139926
num_examples: 81344
download_size: 1092730
dataset_size: 52139926
- config_name: financialphrasebank
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 514854
num_examples: 2070
download_size: 65448
dataset_size: 514854
- config_name: imdb
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 27862150
num_examples: 20000
download_size: 8559151
dataset_size: 27862150
- config_name: intent
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: task_name
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 65522268
num_examples: 404522
download_size: 1669284
dataset_size: 65522268
- config_name: manifesto
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 417565056
num_examples: 953008
download_size: 8569698
dataset_size: 417565056
- config_name: massive
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 23911774
num_examples: 175466
download_size: 558077
dataset_size: 23911774
- config_name: rottentomatoes
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 493664
num_examples: 2132
download_size: 95622
dataset_size: 493664
- config_name: sentiment
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: task_name
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 57771774
num_examples: 72202
download_size: 16757956
dataset_size: 57771774
- config_name: topic
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: task_name
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 853068180
num_examples: 1652915
download_size: 44471303
dataset_size: 853068180
- config_name: trueteacher
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 24821652
num_examples: 17910
download_size: 6972936
dataset_size: 24821652
- config_name: wikitoxic_insult
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 7364528
num_examples: 16854
download_size: 1724127
dataset_size: 7364528
- config_name: wikitoxic_obscene
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 7951550
num_examples: 17382
download_size: 1847410
dataset_size: 7951550
- config_name: wikitoxic_threat
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 5174652
num_examples: 10422
download_size: 1332140
dataset_size: 5174652
- config_name: wikitoxic_toxicaggregated
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 9026954
num_examples: 20000
download_size: 2024344
dataset_size: 9026954
- config_name: yahootopics
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 343270530
num_examples: 500000
download_size: 19108728
dataset_size: 343270530
- config_name: yelpreviews
features:
- name: text
dtype: string
- name: hypothesis
dtype: string
- name: labels
dtype:
class_label:
names:
'0': not_entailment
'1': entailment
- name: dataset_id
dtype: string
- name: label_text
dtype: string
splits:
- name: test
num_bytes: 15688830
num_examples: 20000
download_size: 4505433
dataset_size: 15688830
configs:
- config_name: agnews
data_files:
- split: test
path: agnews/test-*
- config_name: all
data_files:
- split: test
path: all/test-*
- config_name: amazonpolarity
data_files:
- split: test
path: amazonpolarity/test-*
- config_name: appreviews
data_files:
- split: test
path: appreviews/test-*
- config_name: banking77
data_files:
- split: test
path: banking77/test-*
- config_name: biasframes_intent
data_files:
- split: test
path: biasframes_intent/test-*
- config_name: biasframes_offensive
data_files:
- split: test
path: biasframes_offensive/test-*
- config_name: biasframes_sex
data_files:
- split: test
path: biasframes_sex/test-*
- config_name: capsotu
data_files:
- split: test
path: capsotu/test-*
- config_name: emotion
data_files:
- split: test
path: emotion/test-*
- config_name: emotiondair
data_files:
- split: test
path: emotiondair/test-*
- config_name: empathetic
data_files:
- split: test
path: empathetic/test-*
- config_name: financialphrasebank
data_files:
- split: test
path: financialphrasebank/test-*
- config_name: imdb
data_files:
- split: test
path: imdb/test-*
- config_name: intent
data_files:
- split: test
path: intent/test-*
- config_name: manifesto
data_files:
- split: test
path: manifesto/test-*
- config_name: massive
data_files:
- split: test
path: massive/test-*
- config_name: rottentomatoes
data_files:
- split: test
path: rottentomatoes/test-*
- config_name: sentiment
data_files:
- split: test
path: sentiment/test-*
- config_name: topic
data_files:
- split: test
path: topic/test-*
- config_name: trueteacher
data_files:
- split: test
path: trueteacher/test-*
- config_name: wikitoxic_insult
data_files:
- split: test
path: wikitoxic_insult/test-*
- config_name: wikitoxic_obscene
data_files:
- split: test
path: wikitoxic_obscene/test-*
- config_name: wikitoxic_threat
data_files:
- split: test
path: wikitoxic_threat/test-*
- config_name: wikitoxic_toxicaggregated
data_files:
- split: test
path: wikitoxic_toxicaggregated/test-*
- config_name: yahootopics
data_files:
- split: test
path: yahootopics/test-*
- config_name: yelpreviews
data_files:
- split: test
path: yelpreviews/test-*
task_categories:
- text-classification
- zero-shot-classification
language:
- en
size_categories:
- 1M<n<10M
tags:
- zero-shot-classification
- benchmark
pretty_name: 'BTZSC: Benchmark for Textual Zero-Shot Classification'
---
<p align="center">
<img src="https://raw.githubusercontent.com/IliasAarab/btzsc/main/docs/images/btzsc_benchmark.png" align="center" width="60%" alt="BTZSC banner">
</p>
<h1 align="center">BTZSC</h1>
<p align="center">
<em>A benchmark dataset for zero-shot text classification across embedding models, cross-encoders, rerankers, and LLMs.</em>
</p>
<p align="center">
<a href="https://github.com/IliasAarab/btzsc/tags"><img src="https://img.shields.io/github/v/tag/IliasAarab/btzsc?style=flat&color=0080ff&label=version" alt="version"></a>
<a href="https://pypi.org/project/btzsc/"><img src="https://img.shields.io/pypi/pyversions/btzsc?style=flat&color=0080ff" alt="python-versions"></a>
<a href="https://github.com/IliasAarab/btzsc/blob/main/LICENSE"><img src="https://img.shields.io/github/license/IliasAarab/btzsc?style=flat&color=0080ff" alt="license"></a>
</p>
<br>
<p align="center">
<a href="#quickstart">Quickstart</a> |
<a href="#configs">Configs</a> |
<a href="#data-format">Data Format</a> |
<a href="#evaluation">Evaluation</a> |
<a href="#resources">Resources</a> |
<a href="#citing">Citing</a>
</p>
<hr>
## Overview
BTZSC is a dataset-centric benchmark suite for **textual zero-shot classification** that enables *apples-to-apples* evaluation across major model families (cross-encoders, embedding models, rerankers, and LLM-style classifiers). It contains **22 datasets** spanning four common classification tasks: **sentiment**, **topic**, **intent**, and **emotion**.
## Quickstart
```python
from datasets import load_dataset
# Single dataset
ds = load_dataset("btzsc/btzsc", name="agnews", split="test")
# Task bundle
ds_sent = load_dataset("btzsc/btzsc", name="sentiment", split="test")
# Full suite
ds_all = load_dataset("btzsc/btzsc", name="all", split="test")
```
For high-level benchmark evaluation, use the [`btzsc` eval harness](https://github.com/IliasAarab/btzsc):
```python
from btzsc import BTZSCBenchmark
benchmark = BTZSCBenchmark(tasks=["sentiment", "topic"])
results = benchmark.evaluate(
model="intfloat/e5-base-v2",
model_type="embedding",
batch_size=64,
)
print(results.summary())
```
## Configs
BTZSC is published as a single Hugging Face dataset repo with multiple **configs** (`name=...`).
### Base datasets (22)
| Task | Datasets |
|------|----------|
| **Sentiment** | `amazonpolarity`, `imdb`, `appreviews`, `yelpreviews`, `rottentomatoes`, `financialphrasebank` |
| **Emotion** | `emotiondair`, `empathetic` |
| **Intent** | `banking77`, `biasframes_intent`, `massive` |
| **Topic** | `agnews`, `yahootopics`, `trueteacher`, `manifesto`, `capsotu`, `biasframes_offensive`, `biasframes_sex`, `wikitoxic_insult`, `wikitoxic_obscene`, `wikitoxic_threat`, `wikitoxic_toxicaggregated` |
### Convenience bundles
| Bundle | Description |
|--------|-------------|
| `sentiment` | All 6 sentiment datasets |
| `emotion` | All 2 emotion datasets |
| `intent` | All 3 intent datasets |
| `topic` | All 11 topic datasets |
| `all` | All 22 datasets |
These bundles are concatenations of the corresponding base datasets and are provided purely for convenience (e.g., one-command evaluation). They correspond to Table 1 in the paper.
## Data Format
BTZSC is provided in a **pairwise entailment format**, which makes it directly usable with NLI-style cross-encoders and provides a unified interface for other ZSC approaches.
Each row corresponds to a *(text, candidate label)* pair:
| Column | Description |
|--------|-------------|
| `text` | The input document |
| `label_text` | Candidate class name (e.g. `"Business"`) |
| `hypothesis` | Natural-language hypothesis built from `label_text` (e.g. `"This example news text is about business news"`) |
| `labels` | Binary target: `1` = entailment (correct label), `0` = not_entailment |
| `dataset_id` | Dataset identifier (e.g. `agnews`) |
For each original example, BTZSC contains **one positive pair** (the true label) and **multiple negative pairs** (all other labels).
## Evaluation
BTZSC follows a strict zero-shot protocol:
- **Primary metric:** macro-F1 per dataset, averaged across datasets for an overall score
- **Secondary metrics:** accuracy, macro-precision, macro-recall
- No training or tuning on evaluation datasets
- 4 task families: sentiment, topic, intent, emotion
See the [paper](https://openreview.net/pdf?id=IxMryAz2p3) for full details.
## Resources
- Paper (OpenReview): https://openreview.net/forum?id=IxMryAz2p3
- PDF: https://openreview.net/pdf?id=IxMryAz2p3
- Eval harness (GitHub): https://github.com/IliasAarab/btzsc
- Leaderboard Space: https://huggingface.co/spaces/btzsc/btzsc-leaderboard
- Leaderboard results dataset: https://huggingface.co/datasets/btzsc/btzsc-results
## Licensing
BTZSC aggregates multiple public datasets; **licenses vary by source dataset**. Please cite and comply with the original dataset licenses. See Appendix A.5 in the paper for details.
## Citing
```bibtex
@inproceedings{aarab2026btzsc,
title = {BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers},
author = {Aarab, Ilias},
booktitle = {International Conference on Learning Representations (ICLR) 2026},
year = {2026},
note = {OpenReview PDF: https://openreview.net/pdf?id=IxMryAz2p3},
url = {https://openreview.net/forum?id=IxMryAz2p3}
}
```
If you use BTZSC, please also cite the original datasets.
提供机构:
btzsc



