five

snow-simplified-japanese-corpus

收藏
OpenXLab2026-04-18 收录
下载链接:
https://openxlab.org.cn/datasets/OpenDataLab/snow-simplified-japanese-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
SNOW T15: The simplified corpus for the Japanese language. The corpus has 50,000 manually simplified and aligned sentences. This corpus contains the original sentences, simplified sentences and English translation of the original sentences. It can be used for automatic text simplification as well as translating simple Japanese into English and vice-versa. The core vocabulary is restricted to 2,000 words where it is selected by accounting for several factors such as meaning preservation, variation, simplicity and the UniDic word segmentation criterion. For details, refer to the explanation page of Japanese simplification (http://www.jnlp.org/research/Japanese-simplification). The original texts are from "small-parallel-enja: 50k En/Ja Parallel Corpus for Testing SMT Methods", which is a bilingual corpus for machine translation. SNOW T23: An expansion corpus of 35,000 sentences rewritten in easy Japanese (simple Japanese vocabulary) based on SNOW T15. The original texts are from "Tanaka Corpus" (http://www.edrdg.org/wiki/index.php/Tanaka-Corpus).
提供机构:
OpenDataLab
创建时间:
2023-12-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作