Domino-ai/amazon_polarity_10_pct
收藏Hugging Face2023-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Domino-ai/amazon_polarity_10_pct
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
dataset_info:
features:
- name: label
dtype:
class_label:
names:
'0': negative
'1': positive
- name: title
dtype: string
- name: content
dtype: string
splits:
- name: train
num_bytes: 163359702
num_examples: 360000
- name: test
num_bytes: 18182813
num_examples: 40000
download_size: 120691417
dataset_size: 181542515
---
# Amazon Polarity 10pct
This is a direct subset of the original [Amazon Polarity](https://huggingface.co/datasets/amazon_polarity) dataset, downsampled 10pct with a random shuffle
### Dataset Summary
For quicker testing on Amazon Polarity. See https://huggingface.co/datasets/amazon_polarity for details and attributions
### Source Data
```python
from datasets import ClassLabel, Dataset, DatasetDict, load_dataset
ds_full = load_dataset("amazon_polarity", streaming=True)
ds_train_10_pct = Dataset.from_list(list(ds_full["train"].shuffle(seed=42).take(360_000)))
ds_test_10_pct = Dataset.from_list(list(ds_full["test"].shuffle(seed=42).take(40_000)))
ds_10_pct = DatasetDict({"train": ds_train_10_pct, "test": ds_test_10_pct})
# Need to recreate the class labels
class_label = ClassLabel(num_classes=2, names=["negative", "positive"])
ds_10_pct = ds_10_pct.map(lambda row: {"title": row["title"], "content": row["content"], "label": "negative" if not row["label"] else "positive"})
ds_10_pct = ds_10_pct.cast_column("label", class_label)
```
提供机构:
Domino-ai
原始信息汇总
Amazon Polarity 10pct
数据集概述
该数据集是原始Amazon Polarity数据集的直接子集,随机抽样10%。
数据集配置
- 配置名称: default
- 数据文件:
- 训练集:
data/train-* - 测试集:
data/test-*
- 训练集:
数据集信息
- 特征:
- label: 标签,数据类型为类别标签,包含两个类别:
negative和positive。 - title: 标题,数据类型为字符串。
- content: 内容,数据类型为字符串。
- label: 标签,数据类型为类别标签,包含两个类别:
- 分割:
- 训练集:
- 字节数: 163359702
- 样本数: 360000
- 测试集:
- 字节数: 18182813
- 样本数: 40000
- 训练集:
- 下载大小: 120691417
- 数据集大小: 181542515



