five

mbzuai-ugrip-statement-tuning/sentiments

收藏
Hugging Face2024-08-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/mbzuai-ugrip-statement-tuning/sentiments
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: arabic features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 468004 num_examples: 2435 download_size: 187657 dataset_size: 468004 - config_name: chinese features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 30420137 num_examples: 160399 download_size: 14156530 dataset_size: 30420137 - config_name: default features: - name: statement dtype: string - name: label dtype: int64 - name: language dtype: string splits: - name: english num_bytes: 393918 num_examples: 2485 - name: japanese num_bytes: 53826202 num_examples: 160356 - name: chinese num_bytes: 32184526 num_examples: 160399 - name: spanish num_bytes: 325604 num_examples: 2439 - name: arabic num_bytes: 492354 num_examples: 2435 - name: malay num_bytes: 1024644 num_examples: 6263 - name: french num_bytes: 367743 num_examples: 2475 - name: hindi num_bytes: 307080 num_examples: 2454 - name: german num_bytes: 298242 num_examples: 2408 - name: indonesian num_bytes: 3518289 num_examples: 14591 - name: portuguese num_bytes: 314949 num_examples: 2450 - name: italian num_bytes: 342823 num_examples: 2450 download_size: 38641710 dataset_size: 93396374 - config_name: english features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 366583 num_examples: 2485 download_size: 169002 dataset_size: 366583 - config_name: french features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 342993 num_examples: 2475 download_size: 141033 dataset_size: 342993 - config_name: german features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 274162 num_examples: 2408 download_size: 126387 dataset_size: 274162 - config_name: hindi features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 284994 num_examples: 2454 download_size: 129800 dataset_size: 284994 - config_name: indonesian features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 3314015 num_examples: 14591 download_size: 1364537 dataset_size: 3314015 - config_name: italian features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 315873 num_examples: 2450 download_size: 139217 dataset_size: 315873 - config_name: japanese features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 51901930 num_examples: 160356 download_size: 21448616 dataset_size: 51901930 - config_name: malay features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 968277 num_examples: 6263 download_size: 434926 dataset_size: 968277 - config_name: portuguese features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 280649 num_examples: 2450 download_size: 115824 dataset_size: 280649 - config_name: spanish features: - name: statement dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 298775 num_examples: 2439 download_size: 132791 dataset_size: 298775 configs: - config_name: arabic data_files: - split: train path: arabic/train-* - config_name: chinese data_files: - split: train path: chinese/train-* - config_name: default data_files: - split: english path: data/english-* - split: japanese path: data/japanese-* - split: chinese path: data/chinese-* - split: spanish path: data/spanish-* - split: arabic path: data/arabic-* - split: malay path: data/malay-* - split: french path: data/french-* - split: hindi path: data/hindi-* - split: german path: data/german-* - split: indonesian path: data/indonesian-* - split: portuguese path: data/portuguese-* - split: italian path: data/italian-* - config_name: english data_files: - split: train path: english/train-* - config_name: french data_files: - split: train path: french/train-* - config_name: german data_files: - split: train path: german/train-* - config_name: hindi data_files: - split: train path: hindi/train-* - config_name: indonesian data_files: - split: train path: indonesian/train-* - config_name: italian data_files: - split: train path: italian/train-* - config_name: japanese data_files: - split: train path: japanese/train-* - config_name: malay data_files: - split: train path: malay/train-* - config_name: portuguese data_files: - split: train path: portuguese/train-* - config_name: spanish data_files: - split: train path: spanish/train-* ---
提供机构:
mbzuai-ugrip-statement-tuning
原始信息汇总

数据集概述

数据集配置及特征

配置名称 特征
arabic - label: int64<br>- statement: string
chinese - label: int64<br>- statement: string
english - label: int64<br>- statement: string
french - label: int64<br>- statement: string
german - label: int64<br>- statement: string
hindi - label: int64<br>- statement: string
indonesian - label: int64<br>- statement: string
italian - label: int64<br>- statement: string
japanese - label: int64<br>- statement: string
malay - label: int64<br>- statement: string
portuguese - label: int64<br>- statement: string
spanish - label: int64<br>- statement: string

数据集分割详情

配置名称 分割类型 示例数量 字节数
arabic train 1839 358632
arabic validation 324 62654
arabic test 870 166497
chinese train 120000 23025711
chinese validation 3000 569293
chinese test 3000 577236
english train 1839 276335
english validation 324 48436
english test 870 116941
french train 1839 258677
french validation 324 45315
french test 870 120171
german train 1839 214034
german validation 324 36242
german test 870 100436
hindi train 1839 216434
hindi validation 324 39477
hindi test 870 97556
indonesian train 11000 2524197
indonesian validation 1260 285690
indonesian test 500 91265
italian train 1839 241918
italian validation 324 42928
italian test 870 116789
japanese train 120000 39161728
japanese validation 3000 958197
japanese test 3000 970880
malay train 4687 737793
malay validation 1005 159846
malay test 1005 159963
portuguese train 1839 215656
portuguese validation 324 36895
portuguese test 870 101051
spanish train 1839 230538
spanish validation 324 41024
spanish test 870 110146

数据集下载及大小

配置名称 下载大小 数据集大小
arabic 1218664 587783
chinese 15156273 24172240
english 273548 441712
french 229385 424163
german 212604 350712
hindi 215808 353467
indonesian 1582505 2901152
italian 231410 401635
japanese 22812470 41090805
malay 632269 1057602
portuguese 191656 353602
spanish 224658 381708
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作