hpprc/kaken-translations-ja-en
收藏Hugging Face2024-12-08 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/hpprc/kaken-translations-ja-en
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从llm-jp-corpus-v3的kaken子集中提取的日语文本,并使用Qwen/Qwen2.5-32B-Instruct模型将其翻译成英语。数据集旨在创建一个开放的日英平行语料库。数据集的特征包括id、title、text_ja(日语文本)、text_en(英语文本)和model(使用的模型)。数据集分为一个训练集,包含3,976,575个例子,总大小为14,898,659,332字节。数据集的许可证为CC-BY 4.0。
This dataset consists of Japanese texts extracted from the kaken subset of llm-jp-corpus-v3, translated into English using the Qwen/Qwen2.5-32B-Instruct model. The dataset is intended to create an open Japanese-English parallel corpus. The features of the dataset include id, title, text_ja (Japanese text), text_en (English text), and model (the model used). The dataset is divided into a training set containing 3,976,575 examples, with a total size of 14,898,659,332 bytes. The dataset is licensed under CC-BY 4.0.
提供机构:
hpprc



