five

surrey-nlp/BESSTIE-CW-26

收藏
Hugging Face2026-02-17 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/surrey-nlp/BESSTIE-CW-26
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en pretty_name: BESSTIE-NLP-26 tags: - sentiment-analysis - sarcasm - text-classification - dialects - social-media task_categories: - text-classification license: apache-2.0 --- # BESSTIE — NLP Coursework 2026 This dataset is a curated split of the BESSTIE dataset (arXiv:2412.04726). ## Loading with 🤗 Datasets ```python from datasets import load_dataset ds = load_dataset("surrey-nlp/BESSTIE-CW-26") print(ds) print(ds["validation"][0]) ``` ## Summary This dataset contains English user-generated text annotated for: - **Sentiment** (binary: 0 = negative, 1 = positive) - **Sarcasm** (binary: 0 = non-sarcastic, 1 = sarcastic) The data originates from: - **Google** (locale-based reviews) - **Reddit** (subreddit posts and comments) Texts are categorised into three English varieties: - **en-AU** — Australian English - **en-IN** — Indian English - **en-UK** — British English ### Data Fields Each row contains: - `text` (string): the raw text - `variety` (string): one of `en-AU`, `en-IN`, `en-UK` - `source` (string): e.g. `Google` or `Reddit` - `Sentiment` (int/float): 0 or 1 - `Sarcasm` (int/float): 0 or 1 ### Split sizes **Format:** Train / Validation / Test Split ratio: 60% / 5% / 35% | Locale | Sentiment:0 | Sentiment: 1 | Sarcasm: 0 | Sarcasm: 1 | Total | |:--:|:--:|:--:|:--:|:--:|:--:| | en-AU | 633 / 50 / 347 | 512 / 45 / 320 | 808 / 67 / 471 | 337 / 28 / 196 | 1145 / 95 / 667 | | en-IN | 689 / 64 / 430 | 710 / 53 / 386 | 1304 / 109 / 760 | 95 / 8 / 56 | 1399 / 117 / 816 | | en-UK | 585 / 46 / 340 | 618 / 55 / 360 | 1111 / 93 / 647 | 92 / 8 / 53 | 1203 / 101 / 700 | | **Total (by split)** | **1907 / 160 / 1117** | **1840 / 153 / 1066** | **3223 / 269 / 1878** | **524 / 44 / 305** | **3747 / 313 / 2183** | | **Grand Total** | **3184** | **3059** | **5370** | **873** | **6243** | ## Citation Please cite the original BESSTIE paper (arXiv:2412.04726) if using this dataset.
提供机构:
surrey-nlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作