gentaiscool/bitext_nusax_miners
收藏Hugging Face2024-06-13 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/gentaiscool/bitext_nusax_miners
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
language:
- ace
- ban
- bbc
- bjn
- bug
- ind
- jav
- mad
- min
- nij
- sun
configs:
- config_name: default
data_files:
- split: train
path: "train/*"
- config_name: eng-ace
data_files:
- split: train
path: "train/eng-ace.jsonl"
- config_name: eng-ban
data_files:
- split: train
path: "train/eng-ban.jsonl"
- config_name: eng-bbc
data_files:
- split: train
path: "train/eng-bbc.jsonl"
- config_name: eng-bjn
data_files:
- split: train
path: "train/eng-bjn.jsonl"
- config_name: eng-bug
data_files:
- split: train
path: "train/eng-bug.jsonl"
- config_name: eng-ind
data_files:
- split: train
path: "train/eng-ind.jsonl"
- config_name: eng-jav
data_files:
- split: train
path: "train/eng-jav.jsonl"
- config_name: eng-mad
data_files:
- split: train
path: "train/eng-mad.jsonl"
- config_name: eng-min
data_files:
- split: train
path: "train/eng-min.jsonl"
- config_name: eng-nij
data_files:
- split: train
path: "train/eng-nij.jsonl"
- config_name: eng-sun
data_files:
- split: train
path: "train/eng-sun.jsonl"
---
提供机构:
gentaiscool
原始信息汇总
数据集概述
许可证
- 该数据集的许可证为
cc-by-sa-4.0。
支持的语言
- 该数据集支持以下语言:
- Acehnese (ace)
- Balinese (ban)
- Batak Toba (bbc)
- Banjarese (bjn)
- Buginese (bug)
- Indonesian (ind)
- Javanese (jav)
- Madurese (mad)
- Minangkabau (min)
- Nias (nij)
- Sundanese (sun)
配置文件
- 该数据集包含以下配置文件:
- default:
- 训练数据路径:
train/*
- 训练数据路径:
- eng-ace:
- 训练数据路径:
train/eng-ace.jsonl
- 训练数据路径:
- eng-ban:
- 训练数据路径:
train/eng-ban.jsonl
- 训练数据路径:
- eng-bbc:
- 训练数据路径:
train/eng-bbc.jsonl
- 训练数据路径:
- eng-bjn:
- 训练数据路径:
train/eng-bjn.jsonl
- 训练数据路径:
- eng-bug:
- 训练数据路径:
train/eng-bug.jsonl
- 训练数据路径:
- eng-ind:
- 训练数据路径:
train/eng-ind.jsonl
- 训练数据路径:
- eng-jav:
- 训练数据路径:
train/eng-jav.jsonl
- 训练数据路径:
- eng-mad:
- 训练数据路径:
train/eng-mad.jsonl
- 训练数据路径:
- eng-min:
- 训练数据路径:
train/eng-min.jsonl
- 训练数据路径:
- eng-nij:
- 训练数据路径:
train/eng-nij.jsonl
- 训练数据路径:
- eng-sun:
- 训练数据路径:
train/eng-sun.jsonl
- 训练数据路径:
- default:



