five

wecover/OPUS_WikiMatrix

收藏
Hugging Face2024-01-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/wecover/OPUS_WikiMatrix
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: '*/*/train.parquet' - split: valid path: '*/*/valid.parquet' - split: test path: '*/*/test.parquet' - config_name: ar data_files: - split: train path: '*/*ar*/train.parquet' - split: test path: '*/*ar*/test.parquet' - split: valid path: '*/*ar*/valid.parquet' - config_name: az data_files: - split: train path: '*/*az*/train.parquet' - split: test path: '*/*az*/test.parquet' - split: valid path: '*/*az*/valid.parquet' - config_name: be data_files: - split: train path: '*/*be*/train.parquet' - split: test path: '*/*be*/test.parquet' - split: valid path: '*/*be*/valid.parquet' - config_name: bg data_files: - split: train path: '*/*bg*/train.parquet' - split: test path: '*/*bg*/test.parquet' - split: valid path: '*/*bg*/valid.parquet' - config_name: bn data_files: - split: train path: '*/*bn*/train.parquet' - split: test path: '*/*bn*/test.parquet' - split: valid path: '*/*bn*/valid.parquet' - config_name: br data_files: - split: train path: '*/*br*/train.parquet' - split: test path: '*/*br*/test.parquet' - split: valid path: '*/*br*/valid.parquet' - config_name: bs data_files: - split: train path: '*/*bs*/train.parquet' - split: test path: '*/*bs*/test.parquet' - split: valid path: '*/*bs*/valid.parquet' - config_name: ca data_files: - split: train path: '*/*ca*/train.parquet' - split: test path: '*/*ca*/test.parquet' - split: valid path: '*/*ca*/valid.parquet' - config_name: cs data_files: - split: train path: '*/*cs*/train.parquet' - split: test path: '*/*cs*/test.parquet' - split: valid path: '*/*cs*/valid.parquet' - config_name: da data_files: - split: train path: '*/*da*/train.parquet' - split: test path: '*/*da*/test.parquet' - split: valid path: '*/*da*/valid.parquet' - config_name: de data_files: - split: train path: '*/*de*/train.parquet' - split: test path: '*/*de*/test.parquet' - split: valid path: '*/*de*/valid.parquet' - config_name: el data_files: - split: train path: '*/*el*/train.parquet' - split: test path: '*/*el*/test.parquet' - split: valid path: '*/*el*/valid.parquet' - config_name: en data_files: - split: train path: '*/*en*/train.parquet' - split: test path: '*/*en*/test.parquet' - split: valid path: '*/*en*/valid.parquet' - config_name: eo data_files: - split: train path: '*/*eo*/train.parquet' - split: test path: '*/*eo*/test.parquet' - split: valid path: '*/*eo*/valid.parquet' - config_name: es data_files: - split: train path: '*/*es*/train.parquet' - split: test path: '*/*es*/test.parquet' - split: valid path: '*/*es*/valid.parquet' - config_name: et data_files: - split: train path: '*/*et*/train.parquet' - split: test path: '*/*et*/test.parquet' - split: valid path: '*/*et*/valid.parquet' - config_name: eu data_files: - split: train path: '*/*eu*/train.parquet' - split: test path: '*/*eu*/test.parquet' - split: valid path: '*/*eu*/valid.parquet' - config_name: fa data_files: - split: train path: '*/*fa*/train.parquet' - split: test path: '*/*fa*/test.parquet' - split: valid path: '*/*fa*/valid.parquet' - config_name: fi data_files: - split: train path: '*/*fi*/train.parquet' - split: test path: '*/*fi*/test.parquet' - split: valid path: '*/*fi*/valid.parquet' - config_name: fr data_files: - split: train path: '*/*fr*/train.parquet' - split: test path: '*/*fr*/test.parquet' - split: valid path: '*/*fr*/valid.parquet' - config_name: gl data_files: - split: train path: '*/*gl*/train.parquet' - split: test path: '*/*gl*/test.parquet' - split: valid path: '*/*gl*/valid.parquet' - config_name: he data_files: - split: train path: '*/*he*/train.parquet' - split: test path: '*/*he*/test.parquet' - split: valid path: '*/*he*/valid.parquet' - config_name: hi data_files: - split: train path: '*/*hi*/train.parquet' - split: test path: '*/*hi*/test.parquet' - split: valid path: '*/*hi*/valid.parquet' - config_name: hr data_files: - split: train path: '*/*hr*/train.parquet' - split: test path: '*/*hr*/test.parquet' - split: valid path: '*/*hr*/valid.parquet' - config_name: hu data_files: - split: train path: '*/*hu*/train.parquet' - split: test path: '*/*hu*/test.parquet' - split: valid path: '*/*hu*/valid.parquet' - config_name: id data_files: - split: train path: '*/*id*/train.parquet' - split: test path: '*/*id*/test.parquet' - split: valid path: '*/*id*/valid.parquet' - config_name: is data_files: - split: train path: '*/*is*/train.parquet' - split: test path: '*/*is*/test.parquet' - split: valid path: '*/*is*/valid.parquet' - config_name: it data_files: - split: train path: '*/*it*/train.parquet' - split: test path: '*/*it*/test.parquet' - split: valid path: '*/*it*/valid.parquet' - config_name: ja data_files: - split: train path: '*/*ja*/train.parquet' - split: test path: '*/*ja*/test.parquet' - split: valid path: '*/*ja*/valid.parquet' - config_name: kk data_files: - split: train path: '*/*kk*/train.parquet' - split: test path: '*/*kk*/test.parquet' - split: valid path: '*/*kk*/valid.parquet' - config_name: ko data_files: - split: train path: '*/*ko*/train.parquet' - split: test path: '*/*ko*/test.parquet' - split: valid path: '*/*ko*/valid.parquet' - config_name: lt data_files: - split: train path: '*/*lt*/train.parquet' - split: test path: '*/*lt*/test.parquet' - split: valid path: '*/*lt*/valid.parquet' - config_name: mk data_files: - split: train path: '*/*mk*/train.parquet' - split: test path: '*/*mk*/test.parquet' - split: valid path: '*/*mk*/valid.parquet' - config_name: ml data_files: - split: train path: '*/*ml*/train.parquet' - split: test path: '*/*ml*/test.parquet' - split: valid path: '*/*ml*/valid.parquet' - config_name: mr data_files: - split: train path: '*/*mr*/train.parquet' - split: test path: '*/*mr*/test.parquet' - split: valid path: '*/*mr*/valid.parquet' - config_name: ne data_files: - split: train path: '*/*ne*/train.parquet' - split: test path: '*/*ne*/test.parquet' - split: valid path: '*/*ne*/valid.parquet' - config_name: nl data_files: - split: train path: '*/*nl*/train.parquet' - split: test path: '*/*nl*/test.parquet' - split: valid path: '*/*nl*/valid.parquet' - config_name: no data_files: - split: train path: '*/*no*/train.parquet' - split: test path: '*/*no*/test.parquet' - split: valid path: '*/*no*/valid.parquet' - config_name: pl data_files: - split: train path: '*/*pl*/train.parquet' - split: test path: '*/*pl*/test.parquet' - split: valid path: '*/*pl*/valid.parquet' - config_name: pt data_files: - split: train path: '*/*pt*/train.parquet' - split: test path: '*/*pt*/test.parquet' - split: valid path: '*/*pt*/valid.parquet' - config_name: ro data_files: - split: train path: '*/*ro*/train.parquet' - split: test path: '*/*ro*/test.parquet' - split: valid path: '*/*ro*/valid.parquet' - config_name: ru data_files: - split: train path: '*/*ru*/train.parquet' - split: test path: '*/*ru*/test.parquet' - split: valid path: '*/*ru*/valid.parquet' - config_name: si data_files: - split: train path: '*/*si*/train.parquet' - split: test path: '*/*si*/test.parquet' - split: valid path: '*/*si*/valid.parquet' - config_name: sk data_files: - split: train path: '*/*sk*/train.parquet' - split: test path: '*/*sk*/test.parquet' - split: valid path: '*/*sk*/valid.parquet' - config_name: sl data_files: - split: train path: '*/*sl*/train.parquet' - split: test path: '*/*sl*/test.parquet' - split: valid path: '*/*sl*/valid.parquet' - config_name: sq data_files: - split: train path: '*/*sq*/train.parquet' - split: test path: '*/*sq*/test.parquet' - split: valid path: '*/*sq*/valid.parquet' - config_name: sr data_files: - split: train path: '*/*sr*/train.parquet' - split: test path: '*/*sr*/test.parquet' - split: valid path: '*/*sr*/valid.parquet' - config_name: sv data_files: - split: train path: '*/*sv*/train.parquet' - split: test path: '*/*sv*/test.parquet' - split: valid path: '*/*sv*/valid.parquet' - config_name: sw data_files: - split: train path: '*/*sw*/train.parquet' - split: test path: '*/*sw*/test.parquet' - split: valid path: '*/*sw*/valid.parquet' - config_name: ta data_files: - split: train path: '*/*ta*/train.parquet' - split: test path: '*/*ta*/test.parquet' - split: valid path: '*/*ta*/valid.parquet' - config_name: te data_files: - split: train path: '*/*te*/train.parquet' - split: test path: '*/*te*/test.parquet' - split: valid path: '*/*te*/valid.parquet' - config_name: tl data_files: - split: train path: '*/*tl*/train.parquet' - split: test path: '*/*tl*/test.parquet' - split: valid path: '*/*tl*/valid.parquet' - config_name: tr data_files: - split: train path: '*/*tr*/train.parquet' - split: test path: '*/*tr*/test.parquet' - split: valid path: '*/*tr*/valid.parquet' - config_name: uk data_files: - split: train path: '*/*uk*/train.parquet' - split: test path: '*/*uk*/test.parquet' - split: valid path: '*/*uk*/valid.parquet' - config_name: vi data_files: - split: train path: '*/*vi*/train.parquet' - split: test path: '*/*vi*/test.parquet' - split: valid path: '*/*vi*/valid.parquet' - config_name: as data_files: - split: train path: '*/*as*/train.parquet' - split: test path: '*/*as*/test.parquet' - split: valid path: '*/*as*/valid.parquet' - config_name: fy data_files: - split: train path: '*/*fy*/train.parquet' - split: test path: '*/*fy*/test.parquet' - split: valid path: '*/*fy*/valid.parquet' - config_name: ka data_files: - split: train path: '*/*ka*/train.parquet' - split: test path: '*/*ka*/test.parquet' - split: valid path: '*/*ka*/valid.parquet' - config_name: la data_files: - split: train path: '*/*la*/train.parquet' - split: test path: '*/*la*/test.parquet' - split: valid path: '*/*la*/valid.parquet' - config_name: hy data_files: - split: train path: '*/*hy*/train.parquet' - split: test path: '*/*hy*/test.parquet' - split: valid path: '*/*hy*/valid.parquet' - config_name: jv data_files: - split: train path: '*/*jv*/train.parquet' - split: test path: '*/*jv*/test.parquet' - split: valid path: '*/*jv*/valid.parquet' - config_name: mg data_files: - split: train path: '*/*mg*/train.parquet' - split: test path: '*/*mg*/test.parquet' - split: valid path: '*/*mg*/valid.parquet' - config_name: ug data_files: - split: train path: '*/*ug*/train.parquet' - split: test path: '*/*ug*/test.parquet' - split: valid path: '*/*ug*/valid.parquet' ---
提供机构:
wecover
原始信息汇总

数据集概述

该数据集包含多个语言配置,每个配置下有训练、验证和测试三个数据分割,数据文件格式为Parquet。以下是各语言配置及其对应的数据文件路径:

语言配置及数据文件路径

  • 默认配置 (default)

    • 训练集: */*/train.parquet
    • 验证集: */*/valid.parquet
    • 测试集: */*/test.parquet
  • 阿拉伯语 (ar)

    • 训练集: */*ar*/train.parquet
    • 验证集: */*ar*/valid.parquet
    • 测试集: */*ar*/test.parquet
  • 阿塞拜疆语 (az)

    • 训练集: */*az*/train.parquet
    • 验证集: */*az*/valid.parquet
    • 测试集: */*az*/test.parquet
  • 白俄罗斯语 (be)

    • 训练集: */*be*/train.parquet
    • 验证集: */*be*/valid.parquet
    • 测试集: */*be*/test.parquet
  • 保加利亚语 (bg)

    • 训练集: */*bg*/train.parquet
    • 验证集: */*bg*/valid.parquet
    • 测试集: */*bg*/test.parquet
  • 孟加拉语 (bn)

    • 训练集: */*bn*/train.parquet
    • 验证集: */*bn*/valid.parquet
    • 测试集: */*bn*/test.parquet
  • 布列塔尼语 (br)

    • 训练集: */*br*/train.parquet
    • 验证集: */*br*/valid.parquet
    • 测试集: */*br*/test.parquet
  • 波斯尼亚语 (bs)

    • 训练集: */*bs*/train.parquet
    • 验证集: */*bs*/valid.parquet
    • 测试集: */*bs*/test.parquet
  • 加泰罗尼亚语 (ca)

    • 训练集: */*ca*/train.parquet
    • 验证集: */*ca*/valid.parquet
    • 测试集: */*ca*/test.parquet
  • 捷克语 (cs)

    • 训练集: */*cs*/train.parquet
    • 验证集: */*cs*/valid.parquet
    • 测试集: */*cs*/test.parquet
  • 丹麦语 (da)

    • 训练集: */*da*/train.parquet
    • 验证集: */*da*/valid.parquet
    • 测试集: */*da*/test.parquet
  • 德语 (de)

    • 训练集: */*de*/train.parquet
    • 验证集: */*de*/valid.parquet
    • 测试集: */*de*/test.parquet
  • 希腊语 (el)

    • 训练集: */*el*/train.parquet
    • 验证集: */*el*/valid.parquet
    • 测试集: */*el*/test.parquet
  • 英语 (en)

    • 训练集: */*en*/train.parquet
    • 验证集: */*en*/valid.parquet
    • 测试集: */*en*/test.parquet
  • 世界语 (eo)

    • 训练集: */*eo*/train.parquet
    • 验证集: */*eo*/valid.parquet
    • 测试集: */*eo*/test.parquet
  • 西班牙语 (es)

    • 训练集: */*es*/train.parquet
    • 验证集: */*es*/valid.parquet
    • 测试集: */*es*/test.parquet
  • 爱沙尼亚语 (et)

    • 训练集: */*et*/train.parquet
    • 验证集: */*et*/valid.parquet
    • 测试集: */*et*/test.parquet
  • 巴斯克语 (eu)

    • 训练集: */*eu*/train.parquet
    • 验证集: */*eu*/valid.parquet
    • 测试集: */*eu*/test.parquet
  • 波斯语 (fa)

    • 训练集: */*fa*/train.parquet
    • 验证集: */*fa*/valid.parquet
    • 测试集: */*fa*/test.parquet
  • 芬兰语 (fi)

    • 训练集: */*fi*/train.parquet
    • 验证集: */*fi*/valid.parquet
    • 测试集: */*fi*/test.parquet
  • 法语 (fr)

    • 训练集: */*fr*/train.parquet
    • 验证集: */*fr*/valid.parquet
    • 测试集: */*fr*/test.parquet
  • 加利西亚语 (gl)

    • 训练集: */*gl*/train.parquet
    • 验证集: */*gl*/valid.parquet
    • 测试集: */*gl*/test.parquet
  • 希伯来语 (he)

    • 训练集: */*he*/train.parquet
    • 验证集: */*he*/valid.parquet
    • 测试集: */*he*/test.parquet
  • 印地语 (hi)

    • 训练集: */*hi*/train.parquet
    • 验证集: */*hi*/valid.parquet
    • 测试集: */*hi*/test.parquet
  • 克罗地亚语 (hr)

    • 训练集: */*hr*/train.parquet
    • 验证集: */*hr*/valid.parquet
    • 测试集: */*hr*/test.parquet
  • 匈牙利语 (hu)

    • 训练集: */*hu*/train.parquet
    • 验证集: */*hu*/valid.parquet
    • 测试集: */*hu*/test.parquet
  • 印度尼西亚语 (id)

    • 训练集: */*id*/train.parquet
    • 验证集: */*id*/valid.parquet
    • 测试集: */*id*/test.parquet
  • 冰岛语 (is)

    • 训练集: */*is*/train.parquet
    • 验证集: */*is*/valid.parquet
    • 测试集: */*is*/test.parquet
  • 意大利语 (it)

    • 训练集: */*it*/train.parquet
    • 验证集: */*it*/valid.parquet
    • 测试集: */*it*/test.parquet
  • 日语 (ja)

    • 训练集: */*ja*/train.parquet
    • 验证集: */*ja*/valid.parquet
    • 测试集: */*ja*/test.parquet
  • 哈萨克语 (kk)

    • 训练集: */*kk*/train.parquet
    • 验证集: */*kk*/valid.parquet
    • 测试集: */*kk*/test.parquet
  • 韩语 (ko)

    • 训练集: */*ko*/train.parquet
    • 验证集: */*ko*/valid.parquet
    • 测试集: */*ko*/test.parquet
  • 立陶宛语 (lt)

    • 训练集: */*lt*/train.parquet
    • 验证集: */*lt*/valid.parquet
    • 测试集: */*lt*/test.parquet
  • 马其顿语 (mk)

    • 训练集: */*mk*/train.parquet
    • 验证集: */*mk*/valid.parquet
    • 测试集: */*mk*/test.parquet
  • 马拉雅拉姆语 (ml)

    • 训练集: */*ml*/train.parquet
    • 验证集: */*ml*/valid.parquet
    • 测试集: */*ml*/test.parquet
  • 马拉地语 (mr)

    • 训练集: */*mr*/train.parquet
    • 验证集: */*mr*/valid.parquet
    • 测试集: */*mr*/test.parquet
  • 尼泊尔语 (ne)

    • 训练集: */*ne*/train.parquet
    • 验证集: */*ne*/valid.parquet
    • 测试集: */*ne*/test.parquet
  • 荷兰语 (nl)

    • 训练集: */*nl*/train.parquet
    • 验证集: */*nl*/valid.parquet
    • 测试集: */*nl*/test.parquet
  • 挪威语 (no)

    • 训练集: */*no*/train.parquet
    • 验证集: */*no*/valid.parquet
    • 测试集: */*no*/test.parquet
  • 波兰语 (pl)

    • 训练集: */*pl*/train.parquet
    • 验证集: */*pl*/valid.parquet
    • 测试集: */*pl*/test.parquet
  • 葡萄牙语 (pt)

    • 训练集: */*pt*/train.parquet
    • 验证集: */*pt*/valid.parquet
    • 测试集: */*pt*/test.parquet
  • 罗马尼亚语 (ro)

    • 训练集: */*ro*/train.parquet
    • 验证集: */*ro*/valid.parquet
    • 测试集: */*ro*/test.parquet
  • 俄语 (ru)

    • 训练集: */*ru*/train.parquet
    • 验证集: */*ru*/valid.parquet
    • 测试集: */*ru*/test.parquet
  • 僧伽罗语 (si)

    • 训练集: */*si*/train.parquet
    • 验证集: */*si*/valid.parquet
    • 测试集: */*si*/test.parquet
  • 斯洛伐克语 (sk)

    • 训练集: */*sk*/train.parquet
    • 验证集: */*sk*/valid.parquet
    • 测试集: */*sk*/test.parquet
  • 斯洛文尼亚语 (sl)

    • 训练集: */*sl*/train.parquet
    • 验证集: */*sl*/valid.parquet
    • 测试集: */*sl*/test.parquet
  • 阿尔巴尼亚语 (sq)

    • 训练集: */*sq*/train.parquet
    • 验证集: */*sq*/valid.parquet
    • 测试集: */*sq*/test.parquet
  • 塞尔维亚语 (sr)

    • 训练集: */*sr*/train.parquet
    • 验证集: */*sr*/valid.parquet
    • 测试集: */*sr*/test.parquet
  • 瑞典语 (sv)

    • 训练集: */*sv*/train.parquet
    • 验证集: */*sv*/valid.parquet
    • 测试集: */*sv*/test.parquet
  • 斯瓦希里语 (sw)

    • 训练集: */*sw*/train.parquet
    • 验证集: */*sw*/valid.parquet
    • 测试集: */*sw*/test.parquet
  • 泰米尔语 (ta)

    • 训练集: */*ta*/train.parquet
    • 验证集: */*ta*/valid.parquet
    • 测试集: */*ta*/test.parquet
  • 泰卢固语 (te)

    • 训练集: */*te*/train.parquet
    • 验证集: */*te*/valid.parquet
    • 测试集: */*te*/test.parquet
  • 他加禄语 (tl)

    • 训练集: */*tl*/train.parquet
    • 验证集: */*tl*/valid.parquet
    • 测试集: */*tl*/test.parquet
  • 土耳其语 (tr)

    • 训练集: */*tr*/train.parquet
    • 验证集: */*tr*/valid.parquet
    • 测试集: */*tr*/test.parquet
  • 乌克兰语 (uk)

    • 训练集: */*uk*/train.parquet
    • 验证集: */*uk*/valid.parquet
    • 测试集: */*uk*/test.parquet
  • 越南语 (vi)

    • 训练集: */*vi*/train.parquet
    • 验证集: */*vi*/valid.parquet
    • 测试集: */*vi*/test.parquet
  • 阿萨姆语 (as)

    • 训练集: */*as*/train.parquet
    • 验证集: */*as*/valid.parquet
    • 测试集: */*as*/test.parquet
  • 弗里斯兰语 (fy)

    • 训练集: */*fy*/train.parquet
    • 验证集: */*fy*/valid.parquet
    • 测试集: */*fy*/test.parquet
  • 格鲁吉亚语 (ka)

    • 训练集: */*ka*/train.parquet
    • 验证集: */*ka*/valid.parquet
    • 测试集: */*ka*/test.parquet
  • 拉丁语 (la)

    • 训练集: */*la*/train.parquet
    • 验证集: */*la*/valid.parquet
    • 测试集: */*la*/test.parquet
  • 亚美尼亚语 (hy)

    • 训练集: */*hy*/train.parquet
    • 验证集: */*hy*/valid.parquet
    • 测试集: */*hy*/test.parquet
  • 爪哇语 (jv)

    • 训练集: */*jv*/train.parquet
    • 验证集: */*jv*/valid.parquet
    • 测试集: */*jv*/test.parquet
  • 马尔加什语 (mg)

    • 训练集: */*mg*/train.parquet
    • 验证集: */*mg*/valid.parquet
    • 测试集: */*mg*/test.parquet
  • 维吾尔语 (ug)

    • 训练集: */*ug*/train.parquet
    • 验证集: */*ug*/valid.parquet
    • 测试集: */*ug*/test.parquet
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作