wecover/OPUS_OpenSubtitles
收藏Hugging Face2024-01-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/wecover/OPUS_OpenSubtitles
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: '*/*/train.parquet'
- split: valid
path: '*/*/valid.parquet'
- split: test
path: '*/*/test.parquet'
- config_name: af
data_files:
- split: train
path: '*/*af*/train.parquet'
- split: test
path: '*/*af*/test.parquet'
- split: valid
path: '*/*af*/valid.parquet'
- config_name: ar
data_files:
- split: train
path: '*/*ar*/train.parquet'
- split: test
path: '*/*ar*/test.parquet'
- split: valid
path: '*/*ar*/valid.parquet'
- config_name: bg
data_files:
- split: train
path: '*/*bg*/train.parquet'
- split: test
path: '*/*bg*/test.parquet'
- split: valid
path: '*/*bg*/valid.parquet'
- config_name: bn
data_files:
- split: train
path: '*/*bn*/train.parquet'
- split: test
path: '*/*bn*/test.parquet'
- split: valid
path: '*/*bn*/valid.parquet'
- config_name: bs
data_files:
- split: train
path: '*/*bs*/train.parquet'
- split: test
path: '*/*bs*/test.parquet'
- split: valid
path: '*/*bs*/valid.parquet'
- config_name: cs
data_files:
- split: train
path: '*/*cs*/train.parquet'
- split: test
path: '*/*cs*/test.parquet'
- split: valid
path: '*/*cs*/valid.parquet'
- config_name: da
data_files:
- split: train
path: '*/*da*/train.parquet'
- split: test
path: '*/*da*/test.parquet'
- split: valid
path: '*/*da*/valid.parquet'
- config_name: de
data_files:
- split: train
path: '*/*de*/train.parquet'
- split: test
path: '*/*de*/test.parquet'
- split: valid
path: '*/*de*/valid.parquet'
- config_name: el
data_files:
- split: train
path: '*/*el*/train.parquet'
- split: test
path: '*/*el*/test.parquet'
- split: valid
path: '*/*el*/valid.parquet'
- config_name: en
data_files:
- split: train
path: '*/*en*/train.parquet'
- split: test
path: '*/*en*/test.parquet'
- split: valid
path: '*/*en*/valid.parquet'
- config_name: eo
data_files:
- split: train
path: '*/*eo*/train.parquet'
- split: test
path: '*/*eo*/test.parquet'
- split: valid
path: '*/*eo*/valid.parquet'
- config_name: es
data_files:
- split: train
path: '*/*es*/train.parquet'
- split: test
path: '*/*es*/test.parquet'
- split: valid
path: '*/*es*/valid.parquet'
- config_name: et
data_files:
- split: train
path: '*/*et*/train.parquet'
- split: test
path: '*/*et*/test.parquet'
- split: valid
path: '*/*et*/valid.parquet'
- config_name: fa
data_files:
- split: train
path: '*/*fa*/train.parquet'
- split: test
path: '*/*fa*/test.parquet'
- split: valid
path: '*/*fa*/valid.parquet'
- config_name: fi
data_files:
- split: train
path: '*/*fi*/train.parquet'
- split: test
path: '*/*fi*/test.parquet'
- split: valid
path: '*/*fi*/valid.parquet'
- config_name: fr
data_files:
- split: train
path: '*/*fr*/train.parquet'
- split: test
path: '*/*fr*/test.parquet'
- split: valid
path: '*/*fr*/valid.parquet'
- config_name: he
data_files:
- split: train
path: '*/*he*/train.parquet'
- split: test
path: '*/*he*/test.parquet'
- split: valid
path: '*/*he*/valid.parquet'
- config_name: hi
data_files:
- split: train
path: '*/*hi*/train.parquet'
- split: test
path: '*/*hi*/test.parquet'
- split: valid
path: '*/*hi*/valid.parquet'
- config_name: hr
data_files:
- split: train
path: '*/*hr*/train.parquet'
- split: test
path: '*/*hr*/test.parquet'
- split: valid
path: '*/*hr*/valid.parquet'
- config_name: hu
data_files:
- split: train
path: '*/*hu*/train.parquet'
- split: test
path: '*/*hu*/test.parquet'
- split: valid
path: '*/*hu*/valid.parquet'
- config_name: id
data_files:
- split: train
path: '*/*id*/train.parquet'
- split: test
path: '*/*id*/test.parquet'
- split: valid
path: '*/*id*/valid.parquet'
- config_name: it
data_files:
- split: train
path: '*/*it*/train.parquet'
- split: test
path: '*/*it*/test.parquet'
- split: valid
path: '*/*it*/valid.parquet'
- config_name: ja
data_files:
- split: train
path: '*/*ja*/train.parquet'
- split: test
path: '*/*ja*/test.parquet'
- split: valid
path: '*/*ja*/valid.parquet'
- config_name: lt
data_files:
- split: train
path: '*/*lt*/train.parquet'
- split: test
path: '*/*lt*/test.parquet'
- split: valid
path: '*/*lt*/valid.parquet'
- config_name: mk
data_files:
- split: train
path: '*/*mk*/train.parquet'
- split: test
path: '*/*mk*/test.parquet'
- split: valid
path: '*/*mk*/valid.parquet'
- config_name: ml
data_files:
- split: train
path: '*/*ml*/train.parquet'
- split: test
path: '*/*ml*/test.parquet'
- split: valid
path: '*/*ml*/valid.parquet'
- config_name: ms
data_files:
- split: train
path: '*/*ms*/train.parquet'
- split: test
path: '*/*ms*/test.parquet'
- split: valid
path: '*/*ms*/valid.parquet'
- config_name: nl
data_files:
- split: train
path: '*/*nl*/train.parquet'
- split: test
path: '*/*nl*/test.parquet'
- split: valid
path: '*/*nl*/valid.parquet'
- config_name: no
data_files:
- split: train
path: '*/*no*/train.parquet'
- split: test
path: '*/*no*/test.parquet'
- split: valid
path: '*/*no*/valid.parquet'
- config_name: pl
data_files:
- split: train
path: '*/*pl*/train.parquet'
- split: test
path: '*/*pl*/test.parquet'
- split: valid
path: '*/*pl*/valid.parquet'
- config_name: pt
data_files:
- split: train
path: '*/*pt*/train.parquet'
- split: test
path: '*/*pt*/test.parquet'
- split: valid
path: '*/*pt*/valid.parquet'
- config_name: ro
data_files:
- split: train
path: '*/*ro*/train.parquet'
- split: test
path: '*/*ro*/test.parquet'
- split: valid
path: '*/*ro*/valid.parquet'
- config_name: ru
data_files:
- split: train
path: '*/*ru*/train.parquet'
- split: test
path: '*/*ru*/test.parquet'
- split: valid
path: '*/*ru*/valid.parquet'
- config_name: si
data_files:
- split: train
path: '*/*si*/train.parquet'
- split: test
path: '*/*si*/test.parquet'
- split: valid
path: '*/*si*/valid.parquet'
- config_name: sk
data_files:
- split: train
path: '*/*sk*/train.parquet'
- split: test
path: '*/*sk*/test.parquet'
- split: valid
path: '*/*sk*/valid.parquet'
- config_name: sl
data_files:
- split: train
path: '*/*sl*/train.parquet'
- split: test
path: '*/*sl*/test.parquet'
- split: valid
path: '*/*sl*/valid.parquet'
- config_name: sq
data_files:
- split: train
path: '*/*sq*/train.parquet'
- split: test
path: '*/*sq*/test.parquet'
- split: valid
path: '*/*sq*/valid.parquet'
- config_name: sr
data_files:
- split: train
path: '*/*sr*/train.parquet'
- split: test
path: '*/*sr*/test.parquet'
- split: valid
path: '*/*sr*/valid.parquet'
- config_name: sv
data_files:
- split: train
path: '*/*sv*/train.parquet'
- split: test
path: '*/*sv*/test.parquet'
- split: valid
path: '*/*sv*/valid.parquet'
- config_name: ta
data_files:
- split: train
path: '*/*ta*/train.parquet'
- split: test
path: '*/*ta*/test.parquet'
- split: valid
path: '*/*ta*/valid.parquet'
- config_name: th
data_files:
- split: train
path: '*/*th*/train.parquet'
- split: test
path: '*/*th*/test.parquet'
- split: valid
path: '*/*th*/valid.parquet'
- config_name: tr
data_files:
- split: train
path: '*/*tr*/train.parquet'
- split: test
path: '*/*tr*/test.parquet'
- split: valid
path: '*/*tr*/valid.parquet'
- config_name: uk
data_files:
- split: train
path: '*/*uk*/train.parquet'
- split: test
path: '*/*uk*/test.parquet'
- split: valid
path: '*/*uk*/valid.parquet'
- config_name: vi
data_files:
- split: train
path: '*/*vi*/train.parquet'
- split: test
path: '*/*vi*/test.parquet'
- split: valid
path: '*/*vi*/valid.parquet'
- config_name: br
data_files:
- split: train
path: '*/*br*/train.parquet'
- split: test
path: '*/*br*/test.parquet'
- split: valid
path: '*/*br*/valid.parquet'
- config_name: ca
data_files:
- split: train
path: '*/*ca*/train.parquet'
- split: test
path: '*/*ca*/test.parquet'
- split: valid
path: '*/*ca*/valid.parquet'
- config_name: eu
data_files:
- split: train
path: '*/*eu*/train.parquet'
- split: test
path: '*/*eu*/test.parquet'
- split: valid
path: '*/*eu*/valid.parquet'
- config_name: gl
data_files:
- split: train
path: '*/*gl*/train.parquet'
- split: test
path: '*/*gl*/test.parquet'
- split: valid
path: '*/*gl*/valid.parquet'
- config_name: hy
data_files:
- split: train
path: '*/*hy*/train.parquet'
- split: test
path: '*/*hy*/test.parquet'
- split: valid
path: '*/*hy*/valid.parquet'
- config_name: is
data_files:
- split: train
path: '*/*is*/train.parquet'
- split: test
path: '*/*is*/test.parquet'
- split: valid
path: '*/*is*/valid.parquet'
- config_name: ka
data_files:
- split: train
path: '*/*ka*/train.parquet'
- split: test
path: '*/*ka*/test.parquet'
- split: valid
path: '*/*ka*/valid.parquet'
- config_name: kk
data_files:
- split: train
path: '*/*kk*/train.parquet'
- split: test
path: '*/*kk*/test.parquet'
- split: valid
path: '*/*kk*/valid.parquet'
- config_name: ko
data_files:
- split: train
path: '*/*ko*/train.parquet'
- split: test
path: '*/*ko*/test.parquet'
- split: valid
path: '*/*ko*/valid.parquet'
- config_name: te
data_files:
- split: train
path: '*/*te*/train.parquet'
- split: test
path: '*/*te*/test.parquet'
- split: valid
path: '*/*te*/valid.parquet'
- config_name: tl
data_files:
- split: train
path: '*/*tl*/train.parquet'
- split: test
path: '*/*tl*/test.parquet'
- split: valid
path: '*/*tl*/valid.parquet'
- config_name: ur
data_files:
- split: train
path: '*/*ur*/train.parquet'
- split: test
path: '*/*ur*/test.parquet'
- split: valid
path: '*/*ur*/valid.parquet'
---
提供机构:
wecover
原始信息汇总
数据集配置
默认配置
- 训练集:
*/*/train.parquet - 验证集:
*/*/valid.parquet - 测试集:
*/*/test.parquet
语言特定配置
- 阿非利卡语 (af)
- 训练集:
*/*af*/train.parquet - 验证集:
*/*af*/valid.parquet - 测试集:
*/*af*/test.parquet
- 训练集:
- 阿拉伯语 (ar)
- 训练集:
*/*ar*/train.parquet - 验证集:
*/*ar*/valid.parquet - 测试集:
*/*ar*/test.parquet
- 训练集:
- 保加利亚语 (bg)
- 训练集:
*/*bg*/train.parquet - 验证集:
*/*bg*/valid.parquet - 测试集:
*/*bg*/test.parquet
- 训练集:
- 孟加拉语 (bn)
- 训练集:
*/*bn*/train.parquet - 验证集:
*/*bn*/valid.parquet - 测试集:
*/*bn*/test.parquet
- 训练集:
- 波斯尼亚语 (bs)
- 训练集:
*/*bs*/train.parquet - 验证集:
*/*bs*/valid.parquet - 测试集:
*/*bs*/test.parquet
- 训练集:
- 捷克语 (cs)
- 训练集:
*/*cs*/train.parquet - 验证集:
*/*cs*/valid.parquet - 测试集:
*/*cs*/test.parquet
- 训练集:
- 丹麦语 (da)
- 训练集:
*/*da*/train.parquet - 验证集:
*/*da*/valid.parquet - 测试集:
*/*da*/test.parquet
- 训练集:
- 德语 (de)
- 训练集:
*/*de*/train.parquet - 验证集:
*/*de*/valid.parquet - 测试集:
*/*de*/test.parquet
- 训练集:
- 希腊语 (el)
- 训练集:
*/*el*/train.parquet - 验证集:
*/*el*/valid.parquet - 测试集:
*/*el*/test.parquet
- 训练集:
- 英语 (en)
- 训练集:
*/*en*/train.parquet - 验证集:
*/*en*/valid.parquet - 测试集:
*/*en*/test.parquet
- 训练集:
- 世界语 (eo)
- 训练集:
*/*eo*/train.parquet - 验证集:
*/*eo*/valid.parquet - 测试集:
*/*eo*/test.parquet
- 训练集:
- 西班牙语 (es)
- 训练集:
*/*es*/train.parquet - 验证集:
*/*es*/valid.parquet - 测试集:
*/*es*/test.parquet
- 训练集:
- 爱沙尼亚语 (et)
- 训练集:
*/*et*/train.parquet - 验证集:
*/*et*/valid.parquet - 测试集:
*/*et*/test.parquet
- 训练集:
- 波斯语 (fa)
- 训练集:
*/*fa*/train.parquet - 验证集:
*/*fa*/valid.parquet - 测试集:
*/*fa*/test.parquet
- 训练集:
- 芬兰语 (fi)
- 训练集:
*/*fi*/train.parquet - 验证集:
*/*fi*/valid.parquet - 测试集:
*/*fi*/test.parquet
- 训练集:
- 法语 (fr)
- 训练集:
*/*fr*/train.parquet - 验证集:
*/*fr*/valid.parquet - 测试集:
*/*fr*/test.parquet
- 训练集:
- 希伯来语 (he)
- 训练集:
*/*he*/train.parquet - 验证集:
*/*he*/valid.parquet - 测试集:
*/*he*/test.parquet
- 训练集:
- 印地语 (hi)
- 训练集:
*/*hi*/train.parquet - 验证集:
*/*hi*/valid.parquet - 测试集:
*/*hi*/test.parquet
- 训练集:
- 克罗地亚语 (hr)
- 训练集:
*/*hr*/train.parquet - 验证集:
*/*hr*/valid.parquet - 测试集:
*/*hr*/test.parquet
- 训练集:
- 匈牙利语 (hu)
- 训练集:
*/*hu*/train.parquet - 验证集:
*/*hu*/valid.parquet - 测试集:
*/*hu*/test.parquet
- 训练集:
- 印度尼西亚语 (id)
- 训练集:
*/*id*/train.parquet - 验证集:
*/*id*/valid.parquet - 测试集:
*/*id*/test.parquet
- 训练集:
- 意大利语 (it)
- 训练集:
*/*it*/train.parquet - 验证集:
*/*it*/valid.parquet - 测试集:
*/*it*/test.parquet
- 训练集:
- 日语 (ja)
- 训练集:
*/*ja*/train.parquet - 验证集:
*/*ja*/valid.parquet - 测试集:
*/*ja*/test.parquet
- 训练集:
- 立陶宛语 (lt)
- 训练集:
*/*lt*/train.parquet - 验证集:
*/*lt*/valid.parquet - 测试集:
*/*lt*/test.parquet
- 训练集:
- 马其顿语 (mk)
- 训练集:
*/*mk*/train.parquet - 验证集:
*/*mk*/valid.parquet - 测试集:
*/*mk*/test.parquet
- 训练集:
- 马拉雅拉姆语 (ml)
- 训练集:
*/*ml*/train.parquet - 验证集:
*/*ml*/valid.parquet - 测试集:
*/*ml*/test.parquet
- 训练集:
- 马来语 (ms)
- 训练集:
*/*ms*/train.parquet - 验证集:
*/*ms*/valid.parquet - 测试集:
*/*ms*/test.parquet
- 训练集:
- 荷兰语 (nl)
- 训练集:
*/*nl*/train.parquet - 验证集:
*/*nl*/valid.parquet - 测试集:
*/*nl*/test.parquet
- 训练集:
- 挪威语 (no)
- 训练集:
*/*no*/train.parquet - 验证集:
*/*no*/valid.parquet - 测试集:
*/*no*/test.parquet
- 训练集:
- 波兰语 (pl)
- 训练集:
*/*pl*/train.parquet - 验证集:
*/*pl*/valid.parquet - 测试集:
*/*pl*/test.parquet
- 训练集:
- 葡萄牙语 (pt)
- 训练集:
*/*pt*/train.parquet - 验证集:
*/*pt*/valid.parquet - 测试集:
*/*pt*/test.parquet
- 训练集:
- 罗马尼亚语 (ro)
- 训练集:
*/*ro*/train.parquet - 验证集:
*/*ro*/valid.parquet - 测试集:
*/*ro*/test.parquet
- 训练集:
- 俄语 (ru)
- 训练集:
*/*ru*/train.parquet - 验证集:
*/*ru*/valid.parquet - 测试集:
*/*ru*/test.parquet
- 训练集:
- 僧伽罗语 (si)
- 训练集:
*/*si*/train.parquet - 验证集:
*/*si*/valid.parquet - 测试集:
*/*si*/test.parquet
- 训练集:
- 斯洛伐克语 (sk)
- 训练集:
*/*sk*/train.parquet - 验证集:
*/*sk*/valid.parquet - 测试集:
*/*sk*/test.parquet
- 训练集:
- 斯洛文尼亚语 (sl)
- 训练集:
*/*sl*/train.parquet - 验证集:
*/*sl*/valid.parquet - 测试集:
*/*sl*/test.parquet
- 训练集:
- 阿尔巴尼亚语 (sq)
- 训练集:
*/*sq*/train.parquet - 验证集:
*/*sq*/valid.parquet - 测试集:
*/*sq*/test.parquet
- 训练集:
- 塞尔维亚语 (sr)
- 训练集:
*/*sr*/train.parquet - 验证集:
*/*sr*/valid.parquet - 测试集:
*/*sr*/test.parquet
- 训练集:
- 瑞典语 (sv)
- 训练集:
*/*sv*/train.parquet - 验证集:
*/*sv*/valid.parquet - 测试集:
*/*sv*/test.parquet
- 训练集:
- 泰米尔语 (ta)
- 训练集:
*/*ta*/train.parquet - 验证集:
*/*ta*/valid.parquet - 测试集:
*/*ta*/test.parquet
- 训练集:
- 泰语 (th)
- 训练集:
*/*th*/train.parquet - 验证集:
*/*th*/valid.parquet - 测试集:
*/*th*/test.parquet
- 训练集:
- 土耳其语 (tr)
- 训练集:
*/*tr*/train.parquet - 验证集:
*/*tr*/valid.parquet - 测试集:
*/*tr*/test.parquet
- 训练集:
- 乌克兰语 (uk)
- 训练集:
*/*uk*/train.parquet - 验证集:
*/*uk*/valid.parquet - 测试集:
*/*uk*/test.parquet
- 训练集:
- 越南语 (vi)
- 训练集:
*/*vi*/train.parquet - 验证集:
*/*vi*/valid.parquet - 测试集:
*/*vi*/test.parquet
- 训练集:
- 布列塔尼语 (br)
- 训练集:
*/*br*/train.parquet - 验证集:
*/*br*/valid.parquet - 测试集:
*/*br*/test.parquet
- 训练集:
- 加泰罗尼亚语 (ca)
- 训练集:
*/*ca*/train.parquet - 验证集:
*/*ca*/valid.parquet - 测试集:
*/*ca*/test.parquet
- 训练集:
- 巴斯克语 (eu)
- 训练集:
*/*eu*/train.parquet - 验证集:
*/*eu*/valid.parquet - 测试集:
*/*eu*/test.parquet
- 训练集:
- 加利西亚语 (gl)
- 训练集:
*/*gl*/train.parquet - 验证集:
*/*gl*/valid.parquet - 测试集:
*/*gl*/test.parquet
- 训练集:
- 亚美尼亚语 (hy)
- 训练集:
*/*hy*/train.parquet - 验证集:
*/*hy*/valid.parquet - 测试集:
*/*hy*/test.parquet
- 训练集:
- 冰岛语 (is)
- 训练集:
*/*is*/train.parquet - 验证集:
*/*is*/valid.parquet - 测试集:
*/*is*/test.parquet
- 训练集:
- 格鲁吉亚语 (ka)
- 训练集:
*/*ka*/train.parquet - 验证集:
*/*ka*/valid.parquet - 测试集:
*/*ka*/test.parquet
- 训练集:
- 哈萨克语 (kk)
- 训练集:
*/*kk*/train.parquet - 验证集:
*/*kk*/valid.parquet - 测试集:
*/*kk*/test.parquet
- 训练集:
- 韩语 (ko)
- 训练集:
*/*ko*/train.parquet - 验证集:
*/*ko*/valid.parquet - 测试集:
*/*ko*/test.parquet
- 训练集:
- 泰卢固语 (te)
- 训练集:
*/*te*/train.parquet - 验证集:
*/*te*/valid.parquet - 测试集:
*/*te*/test.parquet
- 训练集:
- 塔加路语 (tl)
- 训练集:
*/*tl*/train.parquet - 验证集:
*/*tl*/valid.parquet - 测试集:
*/*tl*/test.parquet
- 训练集:
- 乌尔都语 (ur)
- 训练集:
*/*ur*/train.parquet - 验证集:
*/*ur*/valid.parquet - 测试集:
*/*ur*/test.parquet
- 训练集:



