keeve101/balanced-multi-corpora-mt

Name: keeve101/balanced-multi-corpora-mt
Creator: keeve101
Published: 2025-04-01 21:31:18
License: 暂无描述

Hugging Face2025-04-01 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/keeve101/balanced-multi-corpora-mt

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个多语言文本数据集，包含索引、源文件名、目标语言、英文文本和其他语言的文本信息。每个配置代表一种语言或方言，共有7种配置，分别是hi、id、ms、th、tl、vi和zh，每个配置的训练集都包含136026个样本。数据集主要用于训练机器翻译或其他自然语言处理任务。

This dataset is a multilingual text dataset containing index, source filename, target language, English text, and text in other languages. Each configuration represents a language or dialect, with a total of 7 configurations: hi, id, ms, th, tl, vi, and zh. Each configurations training set contains 136,026 samples. The dataset is primarily used for training machine translation or other natural language processing tasks.

提供机构：

keeve101

5,000+

优质数据集

54 个

任务类型

进入经典数据集