dianalogan/Marketing-Budget-and-Actual-Sales-Dataset

Name: dianalogan/Marketing-Budget-and-Actual-Sales-Dataset
Creator: dianalogan
Published: 2022-10-21 10:12:40
License: 暂无描述

Hugging Face2022-10-21 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/dianalogan/Marketing-Budget-and-Actual-Sales-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

TweetEval数据集包含七个异构的Twitter任务，所有任务都被统一为多类推文分类。任务包括讽刺、仇恨、冒犯、立场、表情符号、情感和情绪。所有任务都被统一到同一个基准中，每个数据集都以相同的格式呈现，并具有固定的训练、验证和测试分割。数据集的文本为英文，来源于Twitter。

The TweetEval dataset includes seven heterogeneous Twitter tasks, all unified under the framework of multi-class tweet classification. The covered tasks are sarcasm, hate, offense, stance, emoji, sentiment, and emotion. All tasks are integrated into a unified benchmark, where each dataset follows the same format and comes with fixed training, validation, and test splits. The text of the dataset is in English and originates from Twitter.

提供机构：

dianalogan

原始信息汇总

数据集概述

数据集名称

名称: TweetEval

数据集内容

任务类型: 多类推文分类
任务列表:
- 讽刺检测
- 仇恨言论检测
- 攻击性语言识别
- 立场检测
- 表情预测
- 情感识别
- 情感分析

数据集结构

语言: 英语
许可证: Apache-2.0
多语言性: 单语
源数据集: 其他生成数据集
任务类别: 文本, 线性回归
任务ID:
- 意图分类
- 多类分类
- 情感分类

训练与评估指标

配置: emotion
- 任务: 文本分类
- 任务ID: 多类分类
- 分割:
  - 训练分割: train
  - 评估分割: test
- 列映射:
  - 文本: text
  - 标签: target
- 评估指标:
  - 准确率 (Accuracy)
  - F1 宏 (F1 macro)
  - F1 微 (F1 micro)
  - F1 加权 (F1 weighted)
  - 精确率宏 (Precision macro)
  - 精确率微 (Precision micro)
  - 精确率加权 (Precision weighted)
  - 召回率宏 (Recall macro)
  - 召回率微 (Recall micro)
  - 召回率加权 (Recall weighted)
配置: hate
- 任务: 文本分类
- 任务ID: 二元分类
- 分割:
  - 训练分割: train
  - 评估分割: test
- 列映射:
  - 文本: text
  - 标签: target
- 评估指标:
  - 准确率 (Accuracy)
  - F1 二元 (F1 binary)
  - 精确率宏 (Precision macro)
  - 精确率微 (Precision micro)
  - 精确率加权 (Precision weighted)
  - 召回率宏 (Recall macro)
  - 召回率微 (Recall micro)
  - 召回率加权 (Recall weighted)
配置: irony
- 任务: 文本分类
- 任务ID: 二元分类
- 分割:
  - 训练分割: train
  - 评估分割: test
- 列映射:
  - 文本: text
  - 标签: target
- 评估指标:
  - 准确率 (Accuracy)
  - F1 二元 (F1 binary)
  - 精确率宏 (Precision macro)
  - 精确率微 (Precision micro)
  - 精确率加权 (Precision weighted)
  - 召回率宏 (Recall macro)
  - 召回率微 (Recall micro)
  - 召回率加权 (Recall weighted)
配置: offensive
- 任务: 文本分类
- 任务ID: 二元分类
- 分割:
  - 训练分割: train
  - 评估分割: test
- 列映射:
  - 文本: text
  - 标签: target
- 评估指标:
  - 准确率 (Accuracy)
  - F1 二元 (F1 binary)
  - 精确率宏 (Precision macro)
  - 精确率微 (Precision micro)
  - 精确率加权 (Precision weighted)
  - 召回率宏 (Recall macro)
  - 召回率微 (Recall micro)
  - 召回率加权 (Recall weighted)
配置: sentiment
- 任务: 文本分类
- 任务ID: 多类分类
- 分割:
  - 训练分割: train
  - 评估分割: test
- 列映射:
  - 文本: text
  - 标签: target
- 评估指标:
  - 准确率 (Accuracy)
  - F1 宏 (F1 macro)
  - F1 微 (F1 micro)
  - F1 加权 (F1 weighted)
  - 精确率宏 (Precision macro)
  - 精确率微 (Precision micro)
  - 精确率加权 (Precision weighted)
  - 召回率宏 (Recall macro)
  - 召回率微 (Recall micro)
  - 召回率加权 (Recall weighted)

数据分割

名称	训练	验证	测试
emoji	45000	5000	50000
emotion	3257	374	1421
hate	9000	1000	2970
irony	2862	955	784
offensive	11916	1324	860
sentiment	45615	2000	12284
stance_abortion	587	66	280
stance_atheism	461	52	220
stance_climate	355	40	169
stance_feminist	597	67	285
stance_hillary	620	69	295

数据字段

emoji:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: ❤
  - 1: 😍
  - 2: 😂
  - 3: 💕
  - 4: 🔥
  - 5: 😊
  - 6: 😎
  - 7: ✨
  - 8: 💙
  - 9: 😘
  - 10: 📷
  - 11: 🇺🇸
  - 12: ☀
  - 13: 💜
  - 14: 😉
  - 15: 💯
  - 16: 😁
  - 17: 🎄
  - 18: 📸
  - 19: 😜
emotion:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 愤怒
  - 1: 喜悦
  - 2: 乐观
  - 3: 悲伤
hate:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 非仇恨
  - 1: 仇恨
irony:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 非讽刺
  - 1: 讽刺
offensive:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 非攻击性
  - 1: 攻击性
sentiment:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 负面
  - 1: 中性
  - 2: 正面
stance_abortion:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 无立场
  - 1: 反对
  - 2: 支持
stance_atheism:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 无立场
  - 1: 反对
  - 2: 支持
stance_climate:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 无立场
  - 1: 反对
  - 2: 支持
stance_feminist:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 无立场
  - 1: 反对
  - 2: 支持
stance_hillary:
- text: 字符串特征，包含推文内容。
- label: 整数分类标签，映射如下:
  - 0: 无立场
  - 1: 反对
  - 2: 支持

许可证信息

总体: 每个子集有自己的许可证，需遵守Twitter的服务条款和API服务条款。
具体子集:
- emoji: 未定义
- emotion(EmoInt): 未定义
- hate (HateEval): 需要权限
- irony: 未定义
- Offensive: 未定义
- Sentiment: 创意共享署名3.0未移植许可证
- Stance: 未定义

引用信息

@inproceedings{barbieri2020tweeteval, title={{TweetEval:Unified Benchmark and Comparative Evaluation for Tweet Classification}}, author={Barbieri, Francesco and Camacho-Collados, Jose and Espinosa-Anke, Luis and Neves, Leonardo}, booktitle={Proceedings of Findings of EMNLP}, year={2020} }

子集引用

情感识别:

@inproceedings{mohammad2018semeval, title={Semeval-2018 task 1: Affect in tweets}, author={Mohammad, Saif and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana}, booktitle={Proceedings of the 12th international workshop on semantic evaluation}, pages={1--17}, year={2018} }

表情预测:

@inproceedings{barbieri2018semeval, title={Semeval 2018 task 2: Multilingual emoji prediction}, author={Barbieri, Francesco and Camacho-Collados, Jose and Ronzano, Francesco and Espinosa-Anke, Luis and Ballesteros, Miguel and Basile, Valerio and Patti, Viviana and Saggion, Horacio}, booktitle={Proceedings of The 12th International Workshop on Semantic Evaluation}, pages={24--33}, year={2018} }

讽刺检测:

@inproceedings{van2018semeval, title={Semeval-2018 task 3: Irony detection in english tweets}, author={Van Hee, Cynthia and Lefever, Els and Hoste, V{e}ronique}, booktitle={Proceedings of The 12th International Workshop on Semantic Evaluation}, pages={39--50}, year={2018} }

仇恨言论检测:

@inproceedings{basile-etal-2019-semeval, title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter", author = "Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Rangel Pardo, Francisco Manuel and Rosso, Paolo and Sanguinetti, Manuela", booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation", year = "2019", address = "Minneapolis, Minnesota, USA", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/S19-2007", doi = "10.18653/v1/S19-2007", pages = "54--63" }

攻击性语言识别:

@inproceedings{zampieri2019semeval, title={SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)}, author={Zampieri, Marcos and Malmasi, Shervin and Nakov, Preslav and Rosenthal, Sara and Farra, Noura and Kumar, Ritesh}, booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation}, pages={75--86}, year={2019} }

情感分析:

@inproceedings{rosenthal2017semeval, title={SemEval-2017 task 4: Sentiment analysis in Twitter}, author={Rosenthal, Sara and Farra, Noura and Nakov, Preslav}, booktitle={Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017)}, pages={502--518}, year={2017} }

立场检测:

@inproceedings{mohammad2016semeval, title={Semeval-2016 task 6: Detecting stance in tweets}, author={Mohammad, Saif and Kiritchenko, Svetlana and Sobhani, Parinaz and Zhu, Xiaodan and Cherry, Colin}, booktitle={Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)}, pages={31--41}, year={2016} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集