surrey-nlp/BESSTIE-CW-26
收藏Hugging Face2026-02-17 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/surrey-nlp/BESSTIE-CW-26
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
pretty_name: BESSTIE-NLP-26
tags:
- sentiment-analysis
- sarcasm
- text-classification
- dialects
- social-media
task_categories:
- text-classification
license: apache-2.0
---
# BESSTIE — NLP Coursework 2026
This dataset is a curated split of the BESSTIE dataset (arXiv:2412.04726).
## Loading with 🤗 Datasets
```python
from datasets import load_dataset
ds = load_dataset("surrey-nlp/BESSTIE-CW-26")
print(ds)
print(ds["validation"][0])
```
## Summary
This dataset contains English user-generated text annotated for:
- **Sentiment** (binary: 0 = negative, 1 = positive)
- **Sarcasm** (binary: 0 = non-sarcastic, 1 = sarcastic)
The data originates from:
- **Google** (locale-based reviews)
- **Reddit** (subreddit posts and comments)
Texts are categorised into three English varieties:
- **en-AU** — Australian English
- **en-IN** — Indian English
- **en-UK** — British English
### Data Fields
Each row contains:
- `text` (string): the raw text
- `variety` (string): one of `en-AU`, `en-IN`, `en-UK`
- `source` (string): e.g. `Google` or `Reddit`
- `Sentiment` (int/float): 0 or 1
- `Sarcasm` (int/float): 0 or 1
### Split sizes
**Format:** Train / Validation / Test
Split ratio: 60% / 5% / 35%
| Locale | Sentiment:0 | Sentiment: 1 | Sarcasm: 0 | Sarcasm: 1 | Total |
|:--:|:--:|:--:|:--:|:--:|:--:|
| en-AU | 633 / 50 / 347 | 512 / 45 / 320 | 808 / 67 / 471 | 337 / 28 / 196 | 1145 / 95 / 667 |
| en-IN | 689 / 64 / 430 | 710 / 53 / 386 | 1304 / 109 / 760 | 95 / 8 / 56 | 1399 / 117 / 816 |
| en-UK | 585 / 46 / 340 | 618 / 55 / 360 | 1111 / 93 / 647 | 92 / 8 / 53 | 1203 / 101 / 700 |
| **Total (by split)** | **1907 / 160 / 1117** | **1840 / 153 / 1066** | **3223 / 269 / 1878** | **524 / 44 / 305** | **3747 / 313 / 2183** |
| **Grand Total** | **3184** | **3059** | **5370** | **873** | **6243** |
## Citation
Please cite the original BESSTIE paper (arXiv:2412.04726) if using this dataset.
提供机构:
surrey-nlp



