BRIGHTER
收藏BRIGHTER 数据集概述
数据集简介
BRIGHTER (BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets) 是一个支持28种语言的多语言情感识别数据集,旨在解决人类标注文本情感识别数据集中的差距问题。
数据集结构
数据集包含三个主要任务:
- 情感分类 (Track A): 检测六种情感(愤怒、厌恶、恐惧、快乐、悲伤、惊讶)的存在与否。
- 情感强度分类 (Track B): 预测情感强度,范围从0(无)到3(高)。
- 多语言情感分类 (Track C): 跨语言迁移场景的评估。
数据格式
- 数据以CSV文件存储,每个语言对应一个文件。
- 列包括:
id、text、anger、disgust、fear、joy、sadness、surprise。 - 二进制任务: 标签为
1(情感存在)或0(情感不存在)。 - 强度任务: 标签范围为
0(无)到3(高)。
评估指标
- 二进制分类: 宏F1分数(跨情感平均)。
- 强度分类: 预测强度值与真实强度值的皮尔逊相关性。
支持模型
微调Transformer模型
- XLM-R Large (
facebook/xlm-roberta-large) - mBERT (
google-bert/bert-base-multilingual-cased) - RemBERT (
google/rembert) - InfoXLM (
microsoft/infoxlm-large) - mDeBERTa (
microsoft/mdeberta-v3-base) - LaBSE (
sentence-transformers/LaBSE)
大型语言模型(零/少样本)
- LLaMA 3.3 70B (
meta-llama/Llama-3.3-70B-Instruct) - Mixtral 8x7B (
mistralai/Mixtral-8x7B-Instruct-v0.1) - DeepSeek R1 70B (
deepseek-ai/DeepSeek-R1-Distill-Llama-70B) - Qwen 2.5 72B (
Qwen/Qwen2.5-72B-Instruct) - Dolly v2 12B (
databricks/dolly-v2-12b)
引用信息
bibtex @misc{muhammad2025brighterbridginggaphumanannotated, title={BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages}, author={Shamsuddeen Hassan Muhammad and Nedjma Ousidhoum and Idris Abdulmumin and Jan Philip Wahle and Terry Ruas and Meriem Beloucif and Christine de Kock and Nirmal Surange and Daniela Teodorescu and Ibrahim Said Ahmad and David Ifeoluwa Adelani and Alham Fikri Aji and Felermino D. M. A. Ali and Ilseyar Alimova and Vladimir Araujo and Nikolay Babakov and Naomi Baes and Ana-Maria Bucur and Andiswa Bukula and Guanqun Cao and Rodrigo Tufino Cardenas and Rendi Chevi and Chiamaka Ijeoma Chukwuneke and Alexandra Ciobotaru and Daryna Dementieva and Murja Sani Gadanya and Robert Geislinger and Bela Gipp and Oumaima Hourrane and Oana Ignat and Falalu Ibrahim Lawan and Rooweither Mabuya and Rahmad Mahendra and Vukosi Marivate and Andrew Piper and Alexander Panchenko and Charles Henrique Porto Ferreira and Vitaly Protasov and Samuel Rutunda and Manish Shrivastava and Aura Cristina Udrea and Lilian Diana Awuor Wanzare and Sophie Wu and Florian Valentin Wunderlich and Hanif Muhammad Zhafran and Tianhui Zhang and Yi Zhou and Saif M. Mohammad}, year={2025}, eprint={2502.11926}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.11926}, }




