38万日英平行语料数据

Name: 38万日英平行语料数据
Creator: 数据堂（北京）科技股份有限公司
Published: 2026-04-28 20:02:44
License: 暂无描述

国家数据集管理服务平台2026-04-28 更新2026-04-29 收录

下载链接：

https://www.ndsms.cn/dataRetrieval/datasetDetail/?id=9cd3a49837b7f990cbb8ff349149e01e

下载链接

链接失效反馈

官方服务：

资源简介：

日英平行语料总计38 万组；排除了政治，黄色色情，个人信息等敏感词汇；可作为文本类数据分析的基础语料库，用于机器翻译等领域。

This Japanese-English parallel corpus contains a total of 380,000 sentence pairs. Sensitive content including political materials, pornography, and personal information has been excluded. It can serve as a foundational corpus for text data analysis and applications such as machine translation.

提供机构：

数据堂（北京）科技股份有限公司

创建时间：

2026-04-28

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集包含38万组日英平行语料，已排除政治、色情和个人信息等敏感内容，适用于文本分析和机器翻译领域。数据规模为0.024797 GB，模态为文本/翻译，主要用于日英翻译和本地化场景，但需授权使用且不可商用。

以上内容由遇见数据集搜集并总结生成