rl-low-resource/rumantsch-varieties-sentences
收藏Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/rl-low-resource/rumantsch-varieties-sentences
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: puter
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 33956760
num_examples: 49836
- name: test
num_bytes: 456256
num_examples: 997
download_size: 20011417
dataset_size: 34413016
- config_name: puter_sampled
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 424019
num_examples: 2000
- name: test
num_bytes: 42694
num_examples: 188
download_size: 298986
dataset_size: 466713
- config_name: rumgr
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 1171405866
num_examples: 422767
- name: test
num_bytes: 453788
num_examples: 997
download_size: 672799331
dataset_size: 1171859654
- config_name: rumgr_sampled
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 241485
num_examples: 946
- name: test
num_bytes: 40639
num_examples: 184
download_size: 173022
dataset_size: 282124
- config_name: surmiran
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 60490625
num_examples: 40998
- name: test
num_bytes: 462032
num_examples: 997
download_size: 34996535
dataset_size: 60952657
- config_name: surmiran_sampled
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 426177
num_examples: 2000
- name: test
num_bytes: 41610
num_examples: 185
download_size: 296172
dataset_size: 467787
- config_name: sursilv
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 244667619
num_examples: 131439
- name: test
num_bytes: 456816
num_examples: 997
download_size: 146674130
dataset_size: 245124435
- config_name: sursilv_sampled
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 438760
num_examples: 2000
- name: test
num_bytes: 41672
num_examples: 184
download_size: 306678
dataset_size: 480432
- config_name: sutsilv
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 31581009
num_examples: 46958
- name: test
num_bytes: 461309
num_examples: 997
download_size: 18409258
dataset_size: 32042318
- config_name: sutsilv_sampled
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 427037
num_examples: 2000
- name: test
num_bytes: 40732
num_examples: 181
download_size: 293040
dataset_size: 467769
- config_name: vallader
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 106086872
num_examples: 73910
- name: test
num_bytes: 458949
num_examples: 997
download_size: 61965926
dataset_size: 106545821
- config_name: vallader_sampled
features:
- name: original
dtype: string
- name: translation
dtype: string
- name: legacy_variety
dtype: string
splits:
- name: train
num_bytes: 436946
num_examples: 2000
- name: test
num_bytes: 40089
num_examples: 178
download_size: 301632
dataset_size: 477035
configs:
- config_name: puter
data_files:
- split: train
path: puter/train-*
- split: test
path: puter/test-*
- config_name: puter_sampled
data_files:
- split: train
path: puter_sampled/train-*
- split: test
path: puter_sampled/test-*
- config_name: rumgr
data_files:
- split: train
path: rumgr/train-*
- split: test
path: rumgr/test-*
- config_name: rumgr_sampled
data_files:
- split: train
path: rumgr_sampled/train-*
- split: test
path: rumgr_sampled/test-*
- config_name: surmiran
data_files:
- split: train
path: surmiran/train-*
- split: test
path: surmiran/test-*
- config_name: surmiran_sampled
data_files:
- split: train
path: surmiran_sampled/train-*
- split: test
path: surmiran_sampled/test-*
- config_name: sursilv
data_files:
- split: train
path: sursilv/train-*
- split: test
path: sursilv/test-*
- config_name: sursilv_sampled
data_files:
- split: train
path: sursilv_sampled/train-*
- split: test
path: sursilv_sampled/test-*
- config_name: sutsilv
data_files:
- split: train
path: sutsilv/train-*
- split: test
path: sutsilv/test-*
- config_name: sutsilv_sampled
data_files:
- split: train
path: sutsilv_sampled/train-*
- split: test
path: sutsilv_sampled/test-*
- config_name: vallader
data_files:
- split: train
path: vallader/train-*
- split: test
path: vallader/test-*
- config_name: vallader_sampled
data_files:
- split: train
path: vallader_sampled/train-*
- split: test
path: vallader_sampled/test-*
---
数据集信息:
- 配置名称:puter
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数33956760,样本数49836
- 测试集:字节数456256,样本数997
下载大小:20011417,数据集总大小:34413016
- 配置名称:puter_sampled
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数424019,样本数2000
- 测试集:字节数42694,样本数188
下载大小:298986,数据集总大小:466713
- 配置名称:rumgr
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数1171405866,样本数422767
- 测试集:字节数453788,样本数997
下载大小:672799331,数据集总大小:1171859654
- 配置名称:rumgr_sampled
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数241485,样本数946
- 测试集:字节数40639,样本数184
下载大小:173022,数据集总大小:282124
- 配置名称:surmiran
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数60490625,样本数40998
- 测试集:字节数462032,样本数997
下载大小:34996535,数据集总大小:60952657
- 配置名称:surmiran_sampled
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数426177,样本数2000
- 测试集:字节数41610,样本数185
下载大小:296172,数据集总大小:467787
- 配置名称:sursilv
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数244667619,样本数131439
- 测试集:字节数456816,样本数997
下载大小:146674130,数据集总大小:245124435
- 配置名称:sursilv_sampled
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数438760,样本数2000
- 测试集:字节数41672,样本数184
下载大小:306678,数据集总大小:480432
- 配置名称:sutsilv
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数31581009,样本数46958
- 测试集:字节数461309,样本数997
下载大小:18409258,数据集总大小:32042318
- 配置名称:sutsilv_sampled
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数427037,样本数2000
- 测试集:字节数40732,样本数181
下载大小:293040,数据集总大小:467769
- 配置名称:vallader
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数106086872,样本数73910
- 测试集:字节数458949,样本数997
下载大小:61965926,数据集总大小:106545821
- 配置名称:vallader_sampled
字段信息:
- 字段名:original(原文),数据类型:字符串
- 字段名:translation(译文),数据类型:字符串
- 字段名:legacy_variety(传统方言变体),数据类型:字符串
数据划分:
- 训练集:字节数436946,样本数2000
- 测试集:字节数40089,样本数178
下载大小:301632,数据集总大小:477035
数据集配置:
- 配置名称:puter
数据文件:
- 训练集分割:路径为puter/train-*
- 测试集分割:路径为puter/test-*
- 配置名称:puter_sampled
数据文件:
- 训练集分割:路径为puter_sampled/train-*
- 测试集分割:路径为puter_sampled/test-*
- 配置名称:rumgr
数据文件:
- 训练集分割:路径为rumgr/train-*
- 测试集分割:路径为rumgr/test-*
- 配置名称:rumgr_sampled
数据文件:
- 训练集分割:路径为rumgr_sampled/train-*
- 测试集分割:路径为rumgr_sampled/test-*
- 配置名称:surmiran
数据文件:
- 训练集分割:路径为surmiran/train-*
- 测试集分割:路径为surmiran/test-*
- 配置名称:surmiran_sampled
数据文件:
- 训练集分割:路径为surmiran_sampled/train-*
- 测试集分割:路径为surmiran_sampled/test-*
- 配置名称:sursilv
数据文件:
- 训练集分割:路径为sursilv/train-*
- 测试集分割:路径为sursilv/test-*
- 配置名称:sursilv_sampled
数据文件:
- 训练集分割:路径为sursilv_sampled/train-*
- 测试集分割:路径为sursilv_sampled/test-*
- 配置名称:sutsilv
数据文件:
- 训练集分割:路径为sutsilv/train-*
- 测试集分割:路径为sutsilv/test-*
- 配置名称:sutsilv_sampled
数据文件:
- 训练集分割:路径为sutsilv_sampled/train-*
- 测试集分割:路径为sutsilv_sampled/test-*
- 配置名称:vallader
数据文件:
- 训练集分割:路径为vallader/train-*
- 测试集分割:路径为vallader/test-*
- 配置名称:vallader_sampled
数据文件:
- 训练集分割:路径为vallader_sampled/train-*
- 测试集分割:路径为vallader_sampled/test-*
提供机构:
rl-low-resource



