mbzuai-ugrip-statement-tuning/sentiments
收藏Hugging Face2024-08-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/mbzuai-ugrip-statement-tuning/sentiments
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: arabic
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 468004
num_examples: 2435
download_size: 187657
dataset_size: 468004
- config_name: chinese
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 30420137
num_examples: 160399
download_size: 14156530
dataset_size: 30420137
- config_name: default
features:
- name: statement
dtype: string
- name: label
dtype: int64
- name: language
dtype: string
splits:
- name: english
num_bytes: 393918
num_examples: 2485
- name: japanese
num_bytes: 53826202
num_examples: 160356
- name: chinese
num_bytes: 32184526
num_examples: 160399
- name: spanish
num_bytes: 325604
num_examples: 2439
- name: arabic
num_bytes: 492354
num_examples: 2435
- name: malay
num_bytes: 1024644
num_examples: 6263
- name: french
num_bytes: 367743
num_examples: 2475
- name: hindi
num_bytes: 307080
num_examples: 2454
- name: german
num_bytes: 298242
num_examples: 2408
- name: indonesian
num_bytes: 3518289
num_examples: 14591
- name: portuguese
num_bytes: 314949
num_examples: 2450
- name: italian
num_bytes: 342823
num_examples: 2450
download_size: 38641710
dataset_size: 93396374
- config_name: english
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 366583
num_examples: 2485
download_size: 169002
dataset_size: 366583
- config_name: french
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 342993
num_examples: 2475
download_size: 141033
dataset_size: 342993
- config_name: german
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 274162
num_examples: 2408
download_size: 126387
dataset_size: 274162
- config_name: hindi
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 284994
num_examples: 2454
download_size: 129800
dataset_size: 284994
- config_name: indonesian
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 3314015
num_examples: 14591
download_size: 1364537
dataset_size: 3314015
- config_name: italian
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 315873
num_examples: 2450
download_size: 139217
dataset_size: 315873
- config_name: japanese
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 51901930
num_examples: 160356
download_size: 21448616
dataset_size: 51901930
- config_name: malay
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 968277
num_examples: 6263
download_size: 434926
dataset_size: 968277
- config_name: portuguese
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 280649
num_examples: 2450
download_size: 115824
dataset_size: 280649
- config_name: spanish
features:
- name: statement
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 298775
num_examples: 2439
download_size: 132791
dataset_size: 298775
configs:
- config_name: arabic
data_files:
- split: train
path: arabic/train-*
- config_name: chinese
data_files:
- split: train
path: chinese/train-*
- config_name: default
data_files:
- split: english
path: data/english-*
- split: japanese
path: data/japanese-*
- split: chinese
path: data/chinese-*
- split: spanish
path: data/spanish-*
- split: arabic
path: data/arabic-*
- split: malay
path: data/malay-*
- split: french
path: data/french-*
- split: hindi
path: data/hindi-*
- split: german
path: data/german-*
- split: indonesian
path: data/indonesian-*
- split: portuguese
path: data/portuguese-*
- split: italian
path: data/italian-*
- config_name: english
data_files:
- split: train
path: english/train-*
- config_name: french
data_files:
- split: train
path: french/train-*
- config_name: german
data_files:
- split: train
path: german/train-*
- config_name: hindi
data_files:
- split: train
path: hindi/train-*
- config_name: indonesian
data_files:
- split: train
path: indonesian/train-*
- config_name: italian
data_files:
- split: train
path: italian/train-*
- config_name: japanese
data_files:
- split: train
path: japanese/train-*
- config_name: malay
data_files:
- split: train
path: malay/train-*
- config_name: portuguese
data_files:
- split: train
path: portuguese/train-*
- config_name: spanish
data_files:
- split: train
path: spanish/train-*
---
提供机构:
mbzuai-ugrip-statement-tuning
原始信息汇总
数据集概述
数据集配置及特征
| 配置名称 | 特征 |
|---|---|
| arabic | - label: int64<br>- statement: string |
| chinese | - label: int64<br>- statement: string |
| english | - label: int64<br>- statement: string |
| french | - label: int64<br>- statement: string |
| german | - label: int64<br>- statement: string |
| hindi | - label: int64<br>- statement: string |
| indonesian | - label: int64<br>- statement: string |
| italian | - label: int64<br>- statement: string |
| japanese | - label: int64<br>- statement: string |
| malay | - label: int64<br>- statement: string |
| portuguese | - label: int64<br>- statement: string |
| spanish | - label: int64<br>- statement: string |
数据集分割详情
| 配置名称 | 分割类型 | 示例数量 | 字节数 |
|---|---|---|---|
| arabic | train | 1839 | 358632 |
| arabic | validation | 324 | 62654 |
| arabic | test | 870 | 166497 |
| chinese | train | 120000 | 23025711 |
| chinese | validation | 3000 | 569293 |
| chinese | test | 3000 | 577236 |
| english | train | 1839 | 276335 |
| english | validation | 324 | 48436 |
| english | test | 870 | 116941 |
| french | train | 1839 | 258677 |
| french | validation | 324 | 45315 |
| french | test | 870 | 120171 |
| german | train | 1839 | 214034 |
| german | validation | 324 | 36242 |
| german | test | 870 | 100436 |
| hindi | train | 1839 | 216434 |
| hindi | validation | 324 | 39477 |
| hindi | test | 870 | 97556 |
| indonesian | train | 11000 | 2524197 |
| indonesian | validation | 1260 | 285690 |
| indonesian | test | 500 | 91265 |
| italian | train | 1839 | 241918 |
| italian | validation | 324 | 42928 |
| italian | test | 870 | 116789 |
| japanese | train | 120000 | 39161728 |
| japanese | validation | 3000 | 958197 |
| japanese | test | 3000 | 970880 |
| malay | train | 4687 | 737793 |
| malay | validation | 1005 | 159846 |
| malay | test | 1005 | 159963 |
| portuguese | train | 1839 | 215656 |
| portuguese | validation | 324 | 36895 |
| portuguese | test | 870 | 101051 |
| spanish | train | 1839 | 230538 |
| spanish | validation | 324 | 41024 |
| spanish | test | 870 | 110146 |
数据集下载及大小
| 配置名称 | 下载大小 | 数据集大小 |
|---|---|---|
| arabic | 1218664 | 587783 |
| chinese | 15156273 | 24172240 |
| english | 273548 | 441712 |
| french | 229385 | 424163 |
| german | 212604 | 350712 |
| hindi | 215808 | 353467 |
| indonesian | 1582505 | 2901152 |
| italian | 231410 | 401635 |
| japanese | 22812470 | 41090805 |
| malay | 632269 | 1057602 |
| portuguese | 191656 | 353602 |
| spanish | 224658 | 381708 |



