awinml/test-MultiFin
收藏Hugging Face2024-05-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/awinml/test-MultiFin
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: da_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 88789.7435458787
num_examples: 891
- name: validation
num_bytes: 22445.89303482587
num_examples: 223
- name: test
num_bytes: 27936.783582089553
num_examples: 279
download_size: 71192
dataset_size: 139172.42016279412
- config_name: el_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 17040.455832037325
num_examples: 171
- name: validation
num_bytes: 4328.13184079602
num_examples: 43
- name: test
num_bytes: 5407.119402985075
num_examples: 54
download_size: 22653
dataset_size: 26775.70707581842
- config_name: en_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 174091.67449455676
num_examples: 1747
- name: validation
num_bytes: 43985.89800995025
num_examples: 437
- name: test
num_bytes: 54671.985074626864
num_examples: 546
download_size: 110216
dataset_size: 272749.5575791339
- config_name: en_lowlevel
features:
- name: text
dtype: string
- name: labels
sequence: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 172316.0
num_examples: 1747
- name: validation
num_bytes: 43479.0
num_examples: 437
download_size: 90710
dataset_size: 215795.0
- config_name: es_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 73144.41275272162
num_examples: 734
- name: validation
num_bytes: 18520.378109452737
num_examples: 184
- name: test
num_bytes: 23030.32338308458
num_examples: 230
download_size: 56032
dataset_size: 114695.11424525894
- config_name: es_lowlevel
features:
- name: text
dtype: string
- name: labels
sequence: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 0.0
num_examples: 0
- name: validation
num_bytes: 0.0
num_examples: 0
download_size: 2132
dataset_size: 0.0
- config_name: fi_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 16143.589735614309
num_examples: 162
- name: validation
num_bytes: 4026.169154228856
num_examples: 40
- name: test
num_bytes: 5106.723880597015
num_examples: 51
download_size: 13810
dataset_size: 25276.48277044018
- config_name: he_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 14848.116485225506
num_examples: 149
- name: validation
num_bytes: 3724.2064676616915
num_examples: 37
- name: test
num_bytes: 4706.1965174129355
num_examples: 47
download_size: 19087
dataset_size: 23278.519470300133
- config_name: hu_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 13452.991446345257
num_examples: 135
- name: validation
num_bytes: 3422.2437810945275
num_examples: 34
- name: test
num_bytes: 4205.537313432836
num_examples: 42
download_size: 20042
dataset_size: 21080.772540872622
- config_name: is_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 3786.767962674961
num_examples: 38
- name: validation
num_bytes: 905.8880597014926
num_examples: 9
- name: test
num_bytes: 1101.4502487562188
num_examples: 11
download_size: 7523
dataset_size: 5794.106271132672
- config_name: it_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 5381.196578538103
num_examples: 54
- name: validation
num_bytes: 1409.1592039800994
num_examples: 14
- name: test
num_bytes: 1702.2412935323382
num_examples: 17
download_size: 10079
dataset_size: 8492.59707605054
- config_name: ja_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 14648.812908242613
num_examples: 147
- name: validation
num_bytes: 3724.2064676616915
num_examples: 37
- name: test
num_bytes: 4606.064676616916
num_examples: 46
download_size: 24222
dataset_size: 22979.08405252122
- config_name: no_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 10064.830637636082
num_examples: 101
- name: validation
num_bytes: 2516.355721393035
num_examples: 25
- name: test
num_bytes: 3104.087064676617
num_examples: 31
download_size: 13022
dataset_size: 15685.273423705734
- config_name: pl_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 54310.22472783826
num_examples: 545
- name: validation
num_bytes: 13688.97512437811
num_examples: 136
- name: test
num_bytes: 17022.412935323384
num_examples: 170
download_size: 55564
dataset_size: 85021.61278753975
- config_name: ru_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 9466.919906687403
num_examples: 95
- name: validation
num_bytes: 2415.7014925373132
num_examples: 24
- name: test
num_bytes: 3003.955223880597
num_examples: 30
download_size: 24800
dataset_size: 14886.576623105313
- config_name: sv_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 2491.2947122861588
num_examples: 25
- name: validation
num_bytes: 603.9253731343283
num_examples: 6
- name: test
num_bytes: 700.9228855721393
num_examples: 7
download_size: 8301
dataset_size: 3796.1429709926265
- config_name: tr_highlevel
features:
- name: text
dtype: string
- name: label
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 143099.96827371695
num_examples: 1436
- name: validation
num_bytes: 36134.86815920398
num_examples: 359
- name: test
num_bytes: 44959.19651741294
num_examples: 449
download_size: 104477
dataset_size: 224194.03295033387
configs:
- config_name: da_highlevel
data_files:
- split: train
path: da_highlevel/train-*
- split: validation
path: da_highlevel/validation-*
- split: test
path: da_highlevel/test-*
- config_name: el_highlevel
data_files:
- split: train
path: el_highlevel/train-*
- split: validation
path: el_highlevel/validation-*
- split: test
path: el_highlevel/test-*
- config_name: en_highlevel
data_files:
- split: train
path: en_highlevel/train-*
- split: validation
path: en_highlevel/validation-*
- split: test
path: en_highlevel/test-*
- config_name: en_lowlevel
data_files:
- split: train
path: en_lowlevel/train-*
- split: validation
path: en_lowlevel/validation-*
- config_name: es_highlevel
data_files:
- split: train
path: es_highlevel/train-*
- split: validation
path: es_highlevel/validation-*
- split: test
path: es_highlevel/test-*
- config_name: es_lowlevel
data_files:
- split: train
path: es_lowlevel/train-*
- split: validation
path: es_lowlevel/validation-*
- config_name: fi_highlevel
data_files:
- split: train
path: fi_highlevel/train-*
- split: validation
path: fi_highlevel/validation-*
- split: test
path: fi_highlevel/test-*
- config_name: he_highlevel
data_files:
- split: train
path: he_highlevel/train-*
- split: validation
path: he_highlevel/validation-*
- split: test
path: he_highlevel/test-*
- config_name: hu_highlevel
data_files:
- split: train
path: hu_highlevel/train-*
- split: validation
path: hu_highlevel/validation-*
- split: test
path: hu_highlevel/test-*
- config_name: is_highlevel
data_files:
- split: train
path: is_highlevel/train-*
- split: validation
path: is_highlevel/validation-*
- split: test
path: is_highlevel/test-*
- config_name: it_highlevel
data_files:
- split: train
path: it_highlevel/train-*
- split: validation
path: it_highlevel/validation-*
- split: test
path: it_highlevel/test-*
- config_name: ja_highlevel
data_files:
- split: train
path: ja_highlevel/train-*
- split: validation
path: ja_highlevel/validation-*
- split: test
path: ja_highlevel/test-*
- config_name: no_highlevel
data_files:
- split: train
path: no_highlevel/train-*
- split: validation
path: no_highlevel/validation-*
- split: test
path: no_highlevel/test-*
- config_name: pl_highlevel
data_files:
- split: train
path: pl_highlevel/train-*
- split: validation
path: pl_highlevel/validation-*
- split: test
path: pl_highlevel/test-*
- config_name: ru_highlevel
data_files:
- split: train
path: ru_highlevel/train-*
- split: validation
path: ru_highlevel/validation-*
- split: test
path: ru_highlevel/test-*
- config_name: sv_highlevel
data_files:
- split: train
path: sv_highlevel/train-*
- split: validation
path: sv_highlevel/validation-*
- split: test
path: sv_highlevel/test-*
- config_name: tr_highlevel
data_files:
- split: train
path: tr_highlevel/train-*
- split: validation
path: tr_highlevel/validation-*
- split: test
path: tr_highlevel/test-*
---
提供机构:
awinml
原始信息汇总
数据集概述
1. 数据集配置信息
-
da_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 891个样本,88789.7435458787字节validation: 223个样本,22445.89303482587字节test: 279个样本,27936.783582089553字节
- 下载大小: 71192字节
- 数据集大小: 139172.42016279412字节
- 特征:
-
el_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 171个样本,17040.455832037325字节validation: 43个样本,4328.13184079602字节test: 54个样本,5407.119402985075字节
- 下载大小: 22653字节
- 数据集大小: 26775.70707581842字节
- 特征:
-
en_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 1747个样本,174091.67449455676字节validation: 437个样本,43985.89800995025字节test: 546个样本,54671.985074626864字节
- 下载大小: 110216字节
- 数据集大小: 272749.5575791339字节
- 特征:
-
en_lowlevel
- 特征:
text: 字符串类型labels: 字符串序列类型id: 字符串类型
- 分割:
train: 1747个样本,172316.0字节validation: 437个样本,43479.0字节
- 下载大小: 90710字节
- 数据集大小: 215795.0字节
- 特征:
-
es_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 734个样本,73144.41275272162字节validation: 184个样本,18520.378109452737字节test: 230个样本,23030.32338308458字节
- 下载大小: 56032字节
- 数据集大小: 114695.11424525894字节
- 特征:
-
es_lowlevel
- 特征:
text: 字符串类型labels: 字符串序列类型id: 字符串类型
- 分割:
train: 0个样本,0.0字节validation: 0个样本,0.0字节
- 下载大小: 2132字节
- 数据集大小: 0.0字节
- 特征:
-
fi_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 162个样本,16143.589735614309字节validation: 40个样本,4026.169154228856字节test: 51个样本,5106.723880597015字节
- 下载大小: 13810字节
- 数据集大小: 25276.48277044018字节
- 特征:
-
he_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 149个样本,14848.116485225506字节validation: 37个样本,3724.2064676616915字节test: 47个样本,4706.1965174129355字节
- 下载大小: 19087字节
- 数据集大小: 23278.519470300133字节
- 特征:
-
hu_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 135个样本,13452.991446345257字节validation: 34个样本,3422.2437810945275字节test: 42个样本,4205.537313432836字节
- 下载大小: 20042字节
- 数据集大小: 21080.772540872622字节
- 特征:
-
is_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 38个样本,3786.767962674961字节validation: 9个样本,905.8880597014926字节test: 11个样本,1101.4502487562188字节
- 下载大小: 7523字节
- 数据集大小: 5794.106271132672字节
- 特征:
-
it_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 54个样本,5381.196578538103字节validation: 14个样本,1409.1592039800994字节test: 17个样本,1702.2412935323382字节
- 下载大小: 10079字节
- 数据集大小: 8492.59707605054字节
- 特征:
-
ja_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 147个样本,14648.812908242613字节validation: 37个样本,3724.2064676616915字节test: 46个样本,4606.064676616916字节
- 下载大小: 24222字节
- 数据集大小: 22979.08405252122字节
- 特征:
-
no_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 101个样本,10064.830637636082字节validation: 25个样本,2516.355721393035字节test: 31个样本,3104.087064676617字节
- 下载大小: 13022字节
- 数据集大小: 15685.273423705734字节
- 特征:
-
pl_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 545个样本,54310.22472783826字节validation: 136个样本,13688.97512437811字节test: 170个样本,17022.412935323384字节
- 下载大小: 55564字节
- 数据集大小: 85021.61278753975字节
- 特征:
-
ru_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 95个样本,9466.919906687403字节validation: 24个样本,2415.7014925373132字节test: 30个样本,3003.955223880597字节
- 下载大小: 24800字节
- 数据集大小: 14886.576623105313字节
- 特征:
-
sv_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 25个样本,2491.2947122861588字节validation: 6个样本,603.9253731343283字节test: 7个样本,700.9228855721393字节
- 下载大小: 8301字节
- 数据集大小: 3796.1429709926265字节
- 特征:
-
tr_highlevel
- 特征:
text: 字符串类型label: 字符串类型id: 字符串类型
- 分割:
train: 1436个样本,143099.96827371695字节validation: 359个样本,36134.86815920398字节test: 449个样本,44959.19651741294字节
- 下载大小: 104477字节
- 数据集大小: 224194.03295033387字节
- 特征:
2. 数据集文件路径
- 每个配置的数据集文件路径格式如下:
config_name/split-*- 例如:
da_highlevel/train-*



