snyamson/covid-tweet-sentiment-analyzer-roberta-latest-data

Name: snyamson/covid-tweet-sentiment-analyzer-roberta-latest-data
Creator: snyamson
Published: 2023-10-29 09:49:31
License: 暂无描述

Hugging Face2023-10-29 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/snyamson/covid-tweet-sentiment-analyzer-roberta-latest-data

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为covid-tweet-sentiment-analyzer-roberta-latest-data，包含训练集和验证集，分别有7999和2000个样本。数据集的特征包括input_ids、attention_mask和labels，分别表示输入文本的tokenized数值、注意力掩码和情感标签。情感标签中，1表示中性，2表示积极，0表示消极。

This dataset is named covid-tweet-sentiment-analyzer-roberta-latest-data. It comprises a training set and a validation set with 7999 and 2000 samples respectively. The features of the dataset include input_ids, attention_mask, and labels, which respectively represent the tokenized numerical values of the input text, the attention mask, and the sentiment labels. For the sentiment labels, 1 denotes neutral sentiment, 2 denotes positive sentiment, and 0 denotes negative sentiment.

提供机构：

snyamson

原始信息汇总

数据集概述

数据集配置

默认配置：
- 训练集：路径为 data/train-*
- 验证集：路径为 data/val-*

数据集信息

特征：
- input_ids：序列类型为 int32，表示文本数据的标记化和数值化。
- attention_mask：序列类型为 int8，用于指示模型应关注或忽略的输入序列部分。
- labels：数据类型为 int64，表示模型的目标值，分别为中性（1）、正向（2）和负向（0）。
数据分割：
- 训练集：字节数为 10366704，样本数为 7999。
- 验证集：字节数为 2592000，样本数为 2000。
数据大小：
- 下载大小：575509 字节
- 数据集大小：12958704 字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集