five

gentaiscool/bitext_nusax_miners

收藏
Hugging Face2024-06-13 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/gentaiscool/bitext_nusax_miners
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 language: - ace - ban - bbc - bjn - bug - ind - jav - mad - min - nij - sun configs: - config_name: default data_files: - split: train path: "train/*" - config_name: eng-ace data_files: - split: train path: "train/eng-ace.jsonl" - config_name: eng-ban data_files: - split: train path: "train/eng-ban.jsonl" - config_name: eng-bbc data_files: - split: train path: "train/eng-bbc.jsonl" - config_name: eng-bjn data_files: - split: train path: "train/eng-bjn.jsonl" - config_name: eng-bug data_files: - split: train path: "train/eng-bug.jsonl" - config_name: eng-ind data_files: - split: train path: "train/eng-ind.jsonl" - config_name: eng-jav data_files: - split: train path: "train/eng-jav.jsonl" - config_name: eng-mad data_files: - split: train path: "train/eng-mad.jsonl" - config_name: eng-min data_files: - split: train path: "train/eng-min.jsonl" - config_name: eng-nij data_files: - split: train path: "train/eng-nij.jsonl" - config_name: eng-sun data_files: - split: train path: "train/eng-sun.jsonl" ---
提供机构:
gentaiscool
原始信息汇总

数据集概述

许可证

  • 该数据集的许可证为 cc-by-sa-4.0

支持的语言

  • 该数据集支持以下语言:
    • Acehnese (ace)
    • Balinese (ban)
    • Batak Toba (bbc)
    • Banjarese (bjn)
    • Buginese (bug)
    • Indonesian (ind)
    • Javanese (jav)
    • Madurese (mad)
    • Minangkabau (min)
    • Nias (nij)
    • Sundanese (sun)

配置文件

  • 该数据集包含以下配置文件:
    • default:
      • 训练数据路径: train/*
    • eng-ace:
      • 训练数据路径: train/eng-ace.jsonl
    • eng-ban:
      • 训练数据路径: train/eng-ban.jsonl
    • eng-bbc:
      • 训练数据路径: train/eng-bbc.jsonl
    • eng-bjn:
      • 训练数据路径: train/eng-bjn.jsonl
    • eng-bug:
      • 训练数据路径: train/eng-bug.jsonl
    • eng-ind:
      • 训练数据路径: train/eng-ind.jsonl
    • eng-jav:
      • 训练数据路径: train/eng-jav.jsonl
    • eng-mad:
      • 训练数据路径: train/eng-mad.jsonl
    • eng-min:
      • 训练数据路径: train/eng-min.jsonl
    • eng-nij:
      • 训练数据路径: train/eng-nij.jsonl
    • eng-sun:
      • 训练数据路径: train/eng-sun.jsonl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作