Weni/Zeroshot_Train-20K_nenhuma_tweet-format
收藏Hugging Face2023-09-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Weni/Zeroshot_Train-20K_nenhuma_tweet-format
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: source_text
dtype: string
- name: target_text
dtype: string
splits:
- name: train
num_bytes: 4411602
num_examples: 20000
download_size: 1748719
dataset_size: 4411602
task_categories:
- zero-shot-classification
language:
- pt
size_categories:
- 10K<n<100K
---
# Dataset Card for "Zeroshot_Train-20K_nenhuma_tweet-format"
This dataset is a train dataset for the Zeroshot models.
It has 20.000 data in a prompt format exclusively for train with class 'nenhuma' in Brazilian Portuguese.
Prompt:
```
"Classifique o tweet entre 'classe1', 'classe2', 'classe3', 'classe4', 'nenhuma' \\n\\nTweet: frase \\n\\nLabel: 'other'
```
The dataset was divided as follows: <br>
```
- 6,000 data: prompt with class option without target class (nenhuma)
- 7,000 data: prompt with class option + target class included as an option. target class is not correct
- 7,000 data: prompt with class option + target class. target class is correct
```
## How to load and use this dataset:
```
from datasets import load_dataset
dataset = load_dataset("Weni/Zeroshot_Train-20K_nenhuma_tweet-format")
dataset
```
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
Weni
原始信息汇总
数据集卡片 "Zeroshot_Train-20K_nenhuma_tweet-format"
数据集概述
该数据集是一个用于零样本学习模型的训练数据集,包含20,000条数据,采用巴西葡萄牙语的提示格式,专门用于训练带有nenhuma类别的模型。
数据集结构
特征
- source_text: 字符串类型,源文本。
- target_text: 字符串类型,目标文本。
分割
- train: 训练集,包含4,411,602字节,20,000个样本。
数据集大小
- 下载大小: 1,748,719字节
- 数据集大小: 4,411,602字节
任务类别
- 零样本分类
语言
- 巴西葡萄牙语
数据集大小类别
- 10K<n<100K
数据集划分
- 6,000条数据: 提示包含类别选项,但不包含目标类别(nenhuma)。
- 7,000条数据: 提示包含类别选项和目标类别,但目标类别不正确。
- 7,000条数据: 提示包含类别选项和目标类别,且目标类别正确。
加载和使用
python from datasets import load_dataset dataset = load_dataset("Weni/Zeroshot_Train-20K_nenhuma_tweet-format") dataset



