TweetEval

Name: TweetEval
Creator: OpenDataLab
Published: 2026-05-17 07:30:10
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/TweetEval

下载链接

链接失效反馈

官方服务：

资源简介：

社交媒体自然语言处理的实验环境过于分散。每年都会提出新的共享任务和数据集，从经典的情感分析到反讽检测或表情符号预测。因此，目前尚不清楚当前的技术状态是什么，因为没有标准化的评估协议，也没有针对此类特定领域数据训练的一组强大的基线。在本文中，我们提出了一个新的评估框架（TweetEval），由七个异构的 Twitter 特定分类任务组成。我们还提供了一组强大的基线作为起点，并比较了不同的语言建模预训练策略。我们的初步实验表明，从现有的预训练通用语言模型开始，然后在 Twitter 语料库上继续训练它们的有效性。

The experimental landscape for social media natural language processing (NLP) is highly fragmented. New shared tasks and datasets are proposed every year, ranging from classic sentiment analysis to irony detection and emoji prediction. Consequently, the current state of the art remains unclear, as there are no standardized evaluation protocols nor a set of strong baselines trained on such domain-specific data. In this work, we propose a novel evaluation framework, TweetEval, which consists of seven heterogeneous, Twitter-specific classification tasks. We additionally provide a strong set of baselines as a starting point, and compare different pre-training strategies for language modeling. Our preliminary experiments demonstrate the effectiveness of starting from existing pre-trained general-purpose language models and then continuing their training on Twitter corpora.

提供机构：

OpenDataLab

创建时间：

2022-05-23

搜集汇总

数据集介绍