cardiffnlp/super_tweeteval

Name: cardiffnlp/super_tweeteval
Creator: cardiffnlp
Published: 2024-07-30 04:04:17
License: 暂无描述

Hugging Face2024-07-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/cardiffnlp/super_tweeteval

下载链接

链接失效反馈

官方服务：

资源简介：

SuperTweetEval是一个统一的基准测试，包含12个异构的自然语言处理任务。这些任务包括主题分类、命名实体识别、问答、问题生成、亲密性分析、推文相似性、意义转移检测、仇恨言论检测、表情符号分类、情感分类、命名实体消歧和情感分类。每个任务都有自定义的训练、验证和测试分割，数据集结构统一，数据字段包括文本、标签、日期等信息。评估指标包括宏F1、答案F1、METEOR、Spearman相关系数等，模型卡片提供了不同任务的预训练模型。

SuperTweetEval is a unified benchmark encompassing 12 heterogeneous natural language processing tasks. These tasks include topic classification, named entity recognition, question answering, question generation, intimacy analysis, tweet similarity, meaning shift detection, hate speech detection, emoji classification, sentiment classification, named entity disambiguation, and sentiment classification. Each task features custom training, validation, and test splits, with a unified dataset structure where the data fields cover text, label, date, and other relevant information. Evaluation metrics include macro-F1, answer F1, METEOR, Spearman correlation coefficient, and more. Model cards provide pre-trained models tailored to different tasks.

提供机构：

cardiffnlp

原始信息汇总

SuperTweetEval 数据集概述

数据集基本信息

名称: SuperTweetEval
语言: 英语 (en)
许可证: 未知
多语言性: 单语
大小: 小于50K
任务类别: 文本分类, 令牌分类, 问答, 其他
任务ID: 主题分类, 命名实体识别, 抽象问答
标签: super_tweet_eval, tweet_eval, 自然语言理解

数据集结构

数据文件配置

config_name: tempo_wic
- train: data/tempo_wic/train.jsonl
- test: data/tempo_wic/test.jsonl
- validation: data/tempo_wic/validation.jsonl
config_name: tweet_emoji
- train: data/tweet_emoji/train.jsonl
- test: data/tweet_emoji/test.jsonl
- validation: data/tweet_emoji/validation.jsonl
config_name: tweet_emotion
- train: data/tweet_emotion/train.jsonl
- test: data/tweet_emotion/test.jsonl
- validation: data/tweet_emotion/validation.jsonl
config_name: tweet_hate
- train: data/tweet_hate/train.jsonl
- test: data/tweet_hate/test.jsonl
- validation: data/tweet_hate/validation.jsonl
config_name: tweet_intimacy
- train: data/tweet_intimacy/train.jsonl
- test: data/tweet_intimacy/test.jsonl
- validation: data/tweet_intimacy/validation.jsonl
config_name: tweet_ner7
- train: data/tweet_ner7/train.jsonl
- test: data/tweet_ner7/test.jsonl
- validation: data/tweet_ner7/validation.jsonl
config_name: tweet_nerd
- train: data/tweet_nerd/train.jsonl
- test: data/tweet_nerd/test.jsonl
- validation: data/tweet_nerd/validation.jsonl
config_name: tweet_qa
- train: data/tweet_qa/train.jsonl
- test: data/tweet_qa/test.jsonl
- validation: data/tweet_qa/validation.jsonl
config_name: tweet_qg
- train: data/tweet_qg/train.jsonl
- test: data/tweet_qg/test.jsonl
- validation: data/tweet_qg/validation.jsonl
config_name: tweet_sentiment
- train: data/tweet_sentiment/train.jsonl
- test: data/tweet_sentiment/test.jsonl
- validation: data/tweet_sentiment/validation.jsonl
config_name: tweet_similarity
- train: data/tweet_similarity/train.jsonl
- test: data/tweet_similarity/test.jsonl
- validation: data/tweet_similarity/validation.jsonl
config_name: tweet_topic
- train: data/tweet_topic/train.jsonl
- test: data/tweet_topic/test.jsonl
- validation: data/tweet_topic/validation.jsonl

数据集任务详情

任务与数据集对应关系

任务	数据集	描述	实例数量
主题分类	TweetTopic	多标签分类	4,585 / 573 / 1,679
命名实体识别	TweetNER7	序列标注	4,616 / 576 / 2,807
问答	TweettQA	生成	9,489 / 1,086 / 1,203
问题生成	TweetQG	生成	9,489 / 1,086 / 1,203
亲密分析	TweetIntimacy	单文本回归	1,191 / 396 / 396
推文相似度	TweetSIM	双文本回归	450 / 100 / 450
意义转移检测	TempoWIC	双文本二分类	1,427 / 395 / 1,472
仇恨言论检测	TweetHate	多类别分类	5,019 / 716 / 1,433
表情符号分类	TweetEmoji100	多类别分类	50,000 / 5,000 / 50,000
情感分类	TweetSentiment	ABSA五点尺度分类	26,632 / 4,000 / 12,379
命名实体消歧	TweetNERD	二分类	20,164 / 4,100 / 20,075
情感分类	TweetEmotion	多标签分类	6,838 / 886 / 3,259

数据集字段

数据字段统一描述

tweet_topic
- text: 字符串
- gold_label_list: 字符串列表
- date: 字符串
tweet_ner7
- text: 字符串
- text_tokenized: 字符串列表
- gold_label_sequence: 字符串列表
- date: 字符串
- entities: 字典列表，包含 {"entity": "string", "type": "string"}
tweet_qa
- text: 字符串
- gold_label_str: 字符串
- context: 字符串
tweet_qg
- text: 字符串
- gold_label_str: 字符串
- context: 字符串
tweet_intimacy
- text: 字符串
- gold_score: 浮点数
tweet_similarity
- text_1: 字符串
- text_2: 字符串
- gold_score: 浮点数
tempo_wic
- gold_label_binary: 整数
- target: 字符串
- text_1: 字符串
- text_tokenized_1: 字符串列表
- token_idx_1: 整数
- date_1: 字符串
- text_2: 字符串
- text_tokenized_2: 字符串列表
- token_idx_2: 整数
- date_2: 字符串
tweet_hate
- gold_label: 整数
- text: 字符串
tweet_emoji
- gold_label: 整数
- text: 字符串
- date: 字符串
tweet_sentiment
- gold_label: 整数
- text: 字符串
- target: 字符串
tweet_nerd
- gold_label_binary: 整数
- target: 字符串
- text: 字符串
- definition: 字符串
- text_start: 整数
- text_end: 整数
- date: 字符串
tweet_emotion
- text: 字符串
- gold_label_list: 字符串列表

评估指标与模型

评估指标

数据集	评估指标	黄金标签
TweetTopic	macro-F1	arts_&_culture, business_&_entrepreneurs, celebrity_&_pop_culture, <br />diaries_&_daily_life, family, fashion_&_style, <br />film_tv_&_video, fitness_&_health, food_&_dining, <br />gaming, learning_&_educational, music, <br />news_&_social_concern, other_hobbies, relationships, <br />science_&_technology, sports, travel_&_adventure, <br />youth_&_student_life
TweetNER7	macro-F1	B-corporation, B-creative_work, B-event, <br />B-group, B-location, B-person, <br />B-product, I-corporation, I-creative_work, <br />I-event, I-group, I-location, <br />I-person, I-product, O
TweettQA	answer-F1	-
TweetQG	METEOR	-
TweetIntimacy	spearman correlation	[1 - 5]
TweetSIM	spearman correlation	[0 - 5]
TempoWIC	accuracy	no, yes
TweetHate	combined-F1<br /> (micro-F1 for hate/not-hate &<br /> macro-F1 for hate speech subclasses)	hate_gender, hate_race, hate_sexuality, hate_religion, hate_origin, <br />hate_disability, hate_age, not_hate
TweetEmoji100	accuracy at top 5	Full emoji list: ./data/tweet_emoji/map.txt
TweetSentiment	1 - MAE^M<br /> (MAE^M : Macro Averaged Mean Absolute Error)	strongly negative , negative, negative or neutral, <br /> positive, strongly positive
TweetNERD	accuracy	no, yes
TweetEmotion	macro-F1	anger, anticipation, disgust, fear, joy, love, optimism, <br />pessimism, sadness, surprise, trust

模型卡片

数据集	模型
TweetTopic	twitter-roberta-base-topic-latest (base) <br> twitter-roberta-large-topic-latest (large)
TweetNER7	twitter-roberta-base-ner7-latest (base) <br> TBA
TweettQA	flan-t5-small-tweet-qa (small) <br> flan-t5-base-tweet-qa (base)
TweetQG	flan-t5-small-tweet-qg (small) <br> flan-t5-base-tweet-qg (base)
TweetIntimacy	twitter-roberta-base-intimacy-latest (base) <br> twitter-roberta-large-intimacy-latest (large)
TweetSIM	twitter-roberta-base-similarity-latest (base) <br> twitter-roberta-large-similarity-latest (large)
TempoWIC	twitter-roberta-base-tempo-wic-latest (base) <br> twitter-roberta-large-tempo-wic-latest (large)
TweetHate	twitter-roberta-base-hate-latest-st (base) <br> twitter-roberta-large-hate-latest (large)
TweetEmoji100	twitter-roberta-base-emoji-latest (base) <br> twitter-roberta-large-emoji-latest (large)
TweetSentiment	twitter-roberta-base-topic-sentiment-latest (base) <br> twitter-roberta-large-topic-sentiment-latest (large)
TweetNERD	twitter-roberta-base-nerd-latest (base) <br> twitter-roberta-large-nerd-latest (large)
TweetEmotion	twitter-roberta-base-emotion-latest (base) <br> twitter-roberta-large-emotion-latest (large)

引用信息

主参考论文

bibtex @inproceedings{antypas2023supertweeteval, title={SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research}, author={Dimosthenis Antypas and Asahi Ushio and Francesco Barbieri and Leonardo Neves and Kiamehr Rezaee and Luis Espinosa-Anke and Jiaxin Pei and Jose Camacho-Collados}, booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023}, year={2023} }

个别数据集引用

TweetTopic

@inproceedings{antypas-etal-2022-twitter, title = "{T}witter Topic Classification", author = "Antypas, Dimosthenis and Ushio, Asahi and Camacho-Collados, Jose and Silva, Vitor and Neves, Leonardo and Barbieri, Francesco", booktitle = "Proceedings of the 29th International Conference on Computational Linguistics", month = oct, year = "2022", address = "Gyeongju, Republic of Korea", publisher = "International Committee on Computational Linguistics",

搜集汇总

数据集介绍

构建方式

SuperTweetEval数据集由12个异构的自然语言处理任务组成，这些任务涵盖了从主题分类到情感分析的广泛领域。数据集的构建基于专家生成的标注，确保了数据的高质量和可靠性。每个任务都提供了自定义的训练、验证和测试集，以支持不同模型的评估和比较。数据集的构建过程严格遵循科学方法，确保数据的多样性和代表性，从而为研究者提供了一个全面的基准。

使用方法

使用SuperTweetEval数据集时，研究者可以根据具体任务选择相应的配置文件，加载训练、验证和测试数据。数据集提供了详细的评估脚本和模型卡片，帮助用户快速上手并进行模型评估。通过HuggingFace平台，用户可以方便地访问和下载数据集，利用提供的预训练模型进行微调或从头训练，从而加速研究进程。

背景与挑战

背景概述

SuperTweetEval数据集由Cardiff University的NLP研究团队创建，旨在为社交媒体文本处理提供一个统一的基准。该数据集涵盖了12种异构的自然语言处理任务，包括主题分类、命名实体识别、问答生成等。其核心研究问题是如何在社交媒体文本中高效地进行多任务学习，以提升模型的泛化能力和性能。该数据集的发布对社交媒体文本分析领域具有重要影响，为研究人员提供了一个标准化的评估平台，促进了相关技术的进步。

当前挑战

SuperTweetEval数据集面临的挑战主要包括：首先，社交媒体文本的多样性和噪声使得数据预处理和特征提取变得复杂；其次，多任务学习的实现需要解决任务间的不平衡和冲突问题；此外，数据集的构建过程中，如何确保标注的一致性和准确性也是一个重要挑战。这些挑战不仅影响了数据集的质量，也对模型的训练和评估提出了更高的要求。

常用场景

经典使用场景

在自然语言处理领域，SuperTweetEval数据集以其多样化的任务集合而著称，涵盖了从主题分类到情感分析等多个方面。其经典使用场景包括但不限于：通过TweetTopic进行多标签主题分类，以识别推文中的主要讨论话题；利用TweetNER7进行命名实体识别，提取推文中的关键实体信息；以及通过TweetSentiment进行情感分析，评估推文的情绪倾向。这些任务共同构成了一个全面的社交媒体文本理解框架，为研究者和开发者提供了丰富的资源。

解决学术问题

SuperTweetEval数据集在学术研究中解决了多个关键问题。首先，它通过提供多任务的统一基准，解决了社交媒体文本处理中的异质性问题，使得不同任务的模型可以在同一平台上进行比较和优化。其次，该数据集通过其丰富的标注信息，解决了情感分析、命名实体识别等任务中的数据稀缺问题，推动了相关领域的研究进展。此外，SuperTweetEval还通过其多样化的任务设置，促进了跨领域研究，如将情感分析与主题分类相结合，探索更深层次的文本理解。

实际应用

在实际应用中，SuperTweetEval数据集具有广泛的应用前景。例如，社交媒体平台可以利用TweetTopic和TweetSentiment任务，自动分类和分析用户生成内容，以优化内容推荐和社区管理。企业则可以通过TweetNER7任务，从社交媒体中提取品牌、产品等关键信息，进行市场分析和品牌监控。此外，新闻机构可以利用TweetHate任务，实时监测和分析社交媒体上的仇恨言论，以提高新闻报道的准确性和社会责任感。

数据集最近研究